Save tokens. Same agent.
Four ways to drop your Claude Code spend without trading away accuracy. Two ship as installable skills; two are pure technique — no install, no diff, you just use them.
caveman
Talk-like-caveman skill that drops filler and replies in technical fragments. ~65% output-token reduction across coding tasks, technical content intact.
curl --create-dirs -fsSL https://skillmake.xyz/i/caveman -o ~/.claude/skills/caveman/SKILL.md
Claude Code (and Codex / Gemini / Cursor / Windsurf / Cline / Copilot — 30+ agents) skill that drops filler and replies in technical fragments. Average 65% output-token reduction across coding tasks, with the technical content intact. Brain still big. Mouth small.
Levels: lite · full (default) · ultra · wenyan. Plus sub-skills for commit messages, PR reviews, session stats, and a /caveman-compress that rewrites CLAUDE.md / memory files for ~46% input-token savings every session — savings stack forever, not just per reply.
# Mac / Linux / WSL / Git Bash curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash
# Windows (PowerShell 5.1+) irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex
Trigger: type /cavemanor say “talk like caveman”. Stop with normal mode. Statusline shows [CAVEMAN] ⛏ 12.4k lifetime tokens saved.
free-claude-code
Local proxy that speaks the Anthropic Messages API to Claude Code and routes each request to a free or cheap provider — NVIDIA NIM, Kimi, Wafer, OpenRouter, DeepSeek, LM Studio, llama.cpp, or Ollama.
curl --create-dirs -fsSL https://skillmake.xyz/i/free-claude-code -o ~/.claude/skills/free-claude-code/SKILL.md
Local proxy that speaks the Anthropic Messages API to Claude Code and translates each request to whichever provider you configure — NVIDIA NIM (free key), Kimi, Wafer, OpenRouter, DeepSeek, LM Studio, llama.cpp, or Ollama. Claude Code itself doesn't change. Streaming, tool use, reasoning blocks all work.
Per-tier routing means MODEL_OPUS, MODEL_SONNET, and MODEL_HAIKU can each point at a different backend — cheap defaults, premium fallback. Local Admin UI at /admin handles keys without an env-var dance.
# 1. install Claude Code (the real one) npm install -g @anthropic-ai/claude-code # 2. install uv + Python 3.14 (macOS / Linux) curl -LsSf https://astral.sh/uv/install.sh | sh uv python install 3.14 # 3. install the proxy uv tool install --force git+https://github.com/Alishahryar1/free-claude-code.git # 4. start it fcc-server # → http://127.0.0.1:8082/admin (paste your NVIDIA NIM key) # 5. launch Claude Code through the proxy fcc-claude
Admin UI is loopback-only by design — don't tunnel it without auth in front. Discord / Telegram bot wrappers and Whisper voice transcription are optional extras.
fan out subagents
Spawn multiple Task subagents in a single message. Each runs in its own context window, so the parent doesn't pay for every grep — and independent work actually runs concurrently.
The single biggest accuracy + cost win in long sessions: when work is independent, send one message with multiple Task tool blocks. Each subagent gets a fresh context, returns just a summary, and the parent context stays clean.
Anti-patterns this kills:
- Sequential greps — three back-to-back grep calls in the parent burn three round-trips of full context. One fan-out = one round-trip plus three short summaries.
- Long investigations in the parent — a 30-file walk inflates the parent transcript forever. The subagent reads 30 files, returns one paragraph; the parent never saw the noise.
- One-after-another research — independent queries done serially wait on each other. Fan-out runs them at the same wall-clock minute.
# Mental model — single message, N tool calls, all run in parallel Agent: [Task: search for hooks usage] [Task: search for context usage] [Task: search for ref usage] # vs. the slow / context-hungry version: Agent: [Task: search for hooks usage] → result Agent: [Task: search for context usage] → result Agent: [Task: search for ref usage] → result
Heuristic: if you'd open three terminal tabs to do it, fan it out. If the next step depends on the previous result, don't — that's sequential by nature.
/goal
Start the session by asking the agent to write down the explicit goal + success criteria + non-goals. Pin it. Refer back when scope creeps.
The agent will happily follow you into yak-shaving for an hour. The trick is to make it write the goal before doing any work — and then make it check itself against that goal at every meaningful branch point.
Concretely: at the top of a non-trivial session, ask for a 3-line statement of (a) what done looks like, (b) what we're intentionally not doing, and (c) the riskiest unknown. Save that to a scratch file or pin it in the prompt. When the agent proposes a fourth tangent, paste the goal back as a one-liner — drift disappears.
# Use at the top of a session
You: /goal — write our explicit goal, success criteria, and non-goals
for this session before we start.
Agent: GOAL
Ship the /powerhouse page with 4 skill blocks + videos.
SUCCESS
- /powerhouse renders, MP4s autoplay, marketplace links resolve
- typecheck clean
- deployed to skillmake.xyz
NON-GOALS
- reworking the global header (separate task)
- publishing to hyperframes.dev cloud
RISKIEST UNKNOWN
- whether the existing build-and-publish-demos.mjs handles
non-mp- seeds without code changesPair with fan out subagents — once the goal is locked, you can confidently send independent pieces of it to parallel subagents without losing the thread.
Stack the wins
The four compose cleanly. caveman cuts ~65% of output tokens at the model; free-claude-code changes who you're paying for those tokens at all; fan out subagents stops the parent context from inflating in the first place; and /goal keeps the whole stack pointed at the right target.
Got another trick?
If you've found a token-saver worth shipping — submit it. Same curation pipeline as the rest of the marketplace.