Local AI Assistant Optimization

April 5, 2026
Public
Agents
AI
Development

As outlined in my previous post, I started running a personal assistant via OpenClaw on a dedicated Mac Mini. The goal is to support development, shipping, marketing, and advisory workflows around the WTF-model (aka “The Fractal”).

Over the past weeks, I also completed and passed the Azure AI Engineer Associate certification alongside my regular work. This provided useful context and deeper understanding of RAG, embeddings, and multimodality, which directly influenced the improvements described below.

Configuration Refactoring

The central config (openclaw.json) gradually became too large and difficult to maintain.
To address this, I leveraged OpenClaw’s support for importing external JSON5 files and split the configuration into smaller, modular components.

Result:

Better structure
Easier maintenance
Cleaner separation of concerns

LaunchAgents & System Processes

With configuration cleaned up, the next step was reorganizing LaunchAgents (executed at system boot or user login).

These handle:

Indexing for local Qdrant RAG
- Emails
- SMS
- Knowledge base
- GitHub code mirror
Polling for new SMS and emails

I standardized their format and introduced centralized logging in ~/log/, making debugging significantly easier and more consistent.

Extending OpenClaw: Skills vs Plugins

OpenClaw offers two extension layers:

Skills → high-level, Markdown-based behavior definitions
Plugins → low-level, tightly integrated code

To improve efficiency, I moved critical paths into Plugins:

Hardcoded email and SMS retrieval
Detailed activity logging

This avoids unnecessary LLM usage for deterministic tasks.

On top of that, I created a lot of custom Skills for:

Backups
Organisation
Operations
and more complex workflows

These enforce structured templates, ensuring consistent and properly formatted outputs.

Agent Architecture Simplification

With the system foundation stabilized, I also decided to slim down the agent setup to whats really needed:

7 -> 5 Agents

The system now follows a semi-isolated orchestrator model:

Agents communicate to another via Discord
One group meeting per day
Additionally each agent has a dedicated 1:1 session with the orchestrator

Each agent operates based on a Mission Skill, defining:

Its primary objective
How it executes tasks
How it participates in meetings

Additionally all agents orientate towards a shared overarching goal, whose milestones are coordinated between orchestrator and supervisor.

Trigger System & Automation

To automate workflows, OpenClaw provides three trigger sources:

cron → scheduled jobs
heartbeat → higher-level periodic triggers
webhook → on-demand execution

I configured jobs.json with detailed cron jobs for:

Cleanup
Backups
Synchronization

Additionally, I enabled webhooks to trigger tasks via external scripts and LaunchAgents—specifically a 5-minute interval check for new SMS and emails.

Local LLM Offloading

While GPT-5.1 (OpenAI API) remains the primary model due to its strong price-performance ratio, it is not optimal for all workloads.

For maintenance-heavy cron jobs (e.g. cleanup and triage), I introduced a local model.

With Gemma 4 and Ollama 0.20 (MLX support), this became practical.
Deployment details:

Model: 26B Mixture-of-Experts
Quantization:
- 4-bit weights
- 4-bit KV cache

Initial benchmarks showed:

30–35 tokens/sec

For production, I decided for a context window to 64k, which remains sufficient for local reasoning and repetitive tasks while lowering memory pressure.

System utilization:

Embedding model: ~3GB
Total RAM usage (M4): ~80–85%

This setup now strikes a good balance between performance, cost, and stability.