Budget

Save money. Same agent.

Eight practical ways to drop agent cost or context waste. Five are installable tools; three are pure technique.

+ Submit a skill

all budget mcps

8 ways to save

#namesourceproof

1

cavemangeneraltool

Use when you want Claude Code (or Codex/Gemini/Cursor/30+ agents) to cut ~75% of output tokens by replying in fragment/telegraphic style — full technical accuracy, smaller mouth.

github.com

JuliusBrussee/caveman

inspect★ 59k

2

free-claude-codegeneraltool

Use when you want Claude Code to route its Anthropic Messages API traffic through a local proxy to free or cheap providers — NVIDIA NIM, Kimi, Wafer, OpenRouter, DeepSeek, LM Studio, llama.cpp, or Ollama.

github.com

Alishahryar1/free-claude-code

inspect★ 24k

3

claude-code-routergeneraltool

Put OpenRouter in front of Claude Code: one API key reaches 300+ models, so the easy turns route to a backend at 2–5% of Sonnet's price while the hard ones stay on a frontier model — and spend is visible per request from one dashboard.

github.com/musistudio

musistudio/claude-code-router

open★ 35k

4

headroomgeneraltool

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM — 60–95% fewer tokens with the same answers. Ships as a library, a proxy, and an MCP server.

github.com/zereight

zereight/headroom

open★ 12k

5

code modegeneraltool

Cloudflare's Code Mode: instead of handing MCP tools to the model directly, it converts them into a TypeScript API the model writes code against — run in a Workers sandbox. Tool schemas and intermediate results stay out of the context window, so token use drops sharply.

github.com/cloudflare

cloudflare/agents

open★ 5.1k

6

fan out subagentsgeneraltechnique

Run independent investigations in parallel subagents so the parent context stays clean and the work finishes at the same wall-clock minute.

workflow

7

/goalgeneraltechnique

Start a session by writing explicit success criteria, non-goals, and the riskiest unknown so the agent has something concrete to steer against.

workflow

8

ask-expert-mcpgeneraltechnique

Let a cheap or open-source model escalate to a stronger one only when it gets stuck — so you pay frontier prices for the hard 5%, not the easy 95%.

workflow

How they compose

caveman cuts output tokens, free-claude-code and claude-code-router change the model bill — the latter routing through OpenRouter to the cheapest capable model, headroom strips 60–95% of the tokens out of context before they ever reach the model, code mode keeps MCP tool schemas and intermediate results in a sandbox instead of the context window, fan out subagents keeps parent context smaller, /goal keeps the work from drifting, and ask-expert-mcp reserves frontier-model spend for the hard 5% of decisions.