Local AI Assistant Optimization
As outlined in my previous post, I started running a personal assistant via OpenClaw on a dedicated Mac Mini. The goal is to support development, shipping, marketing, and advisory workflows around the WTF-model (aka “The Fractal”).
Over the past weeks, I also completed and passed the Azure AI Engineer Associate certification alongside my regular work. This provided useful context and deeper understanding of RAG, embeddings, and multimodality, which directly influenced the improvements described below.
Configuration Refactoring
The central config (openclaw.json) gradually became too large and difficult to maintain.
To address this, I leveraged OpenClaw’s support for importing external JSON5 files and split the configuration into smaller, modular components.
Result:
- Better structure
- Easier maintenance
- Cleaner separation of concerns
LaunchAgents & System Processes
With configuration cleaned up, the next step was reorganizing LaunchAgents (executed at system boot or user login).
These handle:
- Indexing for local Qdrant RAG
- Emails
- SMS
- Knowledge base
- GitHub code mirror
- Polling for new SMS and emails
I standardized their format and introduced centralized logging in ~/log/, making debugging significantly easier and more consistent.
Extending OpenClaw: Skills vs Plugins
OpenClaw offers two extension layers:
- Skills → high-level, Markdown-based behavior definitions
- Plugins → low-level, tightly integrated code
To improve efficiency, I moved critical paths into Plugins:
- Hardcoded email and SMS retrieval
- Detailed activity logging
This avoids unnecessary LLM usage for deterministic tasks.
On top of that, I created a lot of custom Skills for:
- Backups
- Organisation
- Operations
- and more complex workflows
These enforce structured templates, ensuring consistent and properly formatted outputs.
Agent Architecture Simplification
With the system foundation stabilized, I also decided to slim down the agent setup to whats really needed:
7 -> 5 Agents
The system now follows a semi-isolated orchestrator model:
- Agents communicate to another via Discord
- One group meeting per day
- Additionally each agent has a dedicated 1:1 session with the orchestrator
Each agent operates based on a Mission Skill, defining:
- Its primary objective
- How it executes tasks
- How it participates in meetings
Additionally all agents orientate towards a shared overarching goal, whose milestones are coordinated between orchestrator and supervisor.
Trigger System & Automation
To automate workflows, OpenClaw provides three trigger sources:
cron→ scheduled jobsheartbeat→ higher-level periodic triggerswebhook→ on-demand execution
I configured jobs.json with detailed cron jobs for:
- Cleanup
- Backups
- Synchronization
Additionally, I enabled webhooks to trigger tasks via external scripts and LaunchAgents—specifically a 5-minute interval check for new SMS and emails.
Local LLM Offloading
While GPT-5.1 (OpenAI API) remains the primary model due to its strong price-performance ratio, it is not optimal for all workloads.
For maintenance-heavy cron jobs (e.g. cleanup and triage), I introduced a local model.
With Gemma 4 and Ollama 0.20 (MLX support), this became practical.
Deployment details:
- Model: 26B Mixture-of-Experts
- Quantization:
4-bitweights4-bitKV cache
Initial benchmarks showed:
- 30–35 tokens/sec
For production, I decided for a context window to 64k, which remains sufficient for local reasoning and repetitive tasks while lowering memory pressure.
System utilization:
- Embedding model: ~
3GB - Total RAM usage (M4): ~80–85%
This setup now strikes a good balance between performance, cost, and stability.