Context Bloat: How I Accidentally Built OpenClaw 2.0

#ai #hermes #tokens #optimization #mcp #til

I didn’t see it coming

I left OpenClaw because it was slow from the start — bloated with tools and plugins before I’d even done anything meaningful with it. Not gradual accumulation. Just day one overhead that never went away.

Hermes was supposed to be different. Fast. Lean. Local-friendly.

The moment I noticed

I was doing a routine check — just “hi”. Token visibility was on. The number that came back:

Tokens: 32,218
Model: MiniMax-M2.7

For “hi.”

That’s when I started digging.

Why it was so bad

I use Hermes across Discord, CLI, and Telegram. The natural instinct when you get a new agent is to connect everything you might need.

So I had:

  • Todoist MCP — 48 tools for task management
  • ZAI Vision MCP — 12 tools for image analysis
  • ZAI Web Reader MCP — 5 tools for reading web pages
  • Context7 MCP — 6 tools for documentation lookup
  • A collection of built-in tools: browser, terminal, memory, file, web, delegation, and more

Each tool has a schema — a machine-readable description of what it does and what parameters it takes. The LLM needs these to know what’s possible. They get sent on every single API call.

The math sneaks up on you. One tool schema might be 300 characters. Fine. But 71 tools later, you’re at 21,000 characters just for the tool registry. That’s ~5,000 tokens before you’ve said hello.

What it was doing to me

This is the part I didn’t notice at first. The bloat wasn’t just slow — it was degrading the quality of everything.

LLMs have context windows. When they fill up — and 32K tokens fills up fast when you’re adding tool schemas on every turn — the system does one of two things:

Compression — the agent summarises recent conversation into a shorter form. Useful, but lossy. Every compression pass loses nuance. Exact quotes become paraphrases. Specific details become generalities.

Truncation — older messages get dropped entirely when space runs out. You can’t recover them.

The first sign was subtle. I’d ask something specific — “what was the git command we used yesterday for the force push?” — and the agent would hedge. “Something like…” instead of “you ran git pull --rebase”. The detail was gone, compressed out or never retained because the context was too crowded with tool schemas to begin with.

I thought the model was getting worse. It wasn’t. The context was just full of junk.

The fix

I disabled three MCP servers. It took about five minutes.

  • Todoist MCP — gone. I use the Todoist REST API directly via curl, which the agent already knows how to do.
  • ZAI Vision MCP — gone. Image analysis goes through the ZAI tool directly, not as a persistent MCP server.
  • ZAI Web Reader MCP — gone. Web content comes through the agent’s built-in web tools.

The result:

Tool tokens saved: ~18,700

Exactly three MCP servers disabled. Five minutes. No functionality lost.

Here’s where those tokens were hiding:

ComponentBeforeAfter
Tool schemas~18,700~12,700
System prompt~3,800~3,800
Conversation~9,700~700
Total~32,200~17,200

The system prompt didn’t shrink at all — it was already lean. The entire win came from removing tool schemas. Todoist alone was 13,000 of those tokens. I use it maybe twice a week.

The session that followed

The difference was immediate. The agent was snappy. More importantly, the conversation history was clean. When I asked about something from earlier in the session, it didn’t hedge — it answered precisely. Not because the model changed, but because there was finally room in the context window for the conversation instead of just the infrastructure.

Details stuck. Quotes stayed exact. The agent remembered the specific git command, the exact file path, the particular API flag I’d used. All the things compression and truncation had been eating away at.

The lesson

I built OpenClaw 2.0 without meaning to. The accumulation was gradual — each new tool felt small, each MCP server seemed useful, each capability justified in isolation.

The problem isn’t adding tools. The problem is adding them all permanently, all at once, into the startup cost of every single turn.

The right model is on-demand, not always-on. Tools you use every conversation — terminal, file read, web search — stay. Tools you use occasionally — image analysis, Todoist, documentation lookups — should load when called, not at startup.

Most frameworks make this hard. They assume you want everything available always. That’s fine for a demo. It’s a slow leak in production.

I now keep token counts visible. Every session. If a “hi” is taking 30K+ tokens, something is wrong — and it usually means something will get compressed or truncated before the conversation gets interesting.

The agent should spend its context budget on your work, not on itself.