Tricks

Save tokens. Same agent.

Four ways to drop your Claude Code spend without trading away accuracy. Two ship as installable skills; two are pure technique — no install, no diff, you just use them.

why use many token when few do trick

generaltool

caveman

Talk-like-caveman skill that drops filler and replies in technical fragments. ~65% output-token reduction across coding tasks, technical content intact.

marketplace/caveman →JuliusBrussee/caveman· ★ 59k

One-line install

curl --create-dirs -fsSL https://skillmake.xyz/i/caveman -o ~/.claude/skills/caveman/SKILL.md

Claude Code (and Codex / Gemini / Cursor / Windsurf / Cline / Copilot — 30+ agents) skill that drops filler and replies in technical fragments. Average 65% output-token reduction across coding tasks, with the technical content intact. Brain still big. Mouth small.

Levels: lite · full (default) · ultra · wenyan. Plus sub-skills for commit messages, PR reviews, session stats, and a /caveman-compress that rewrites CLAUDE.md / memory files for ~46% input-token savings every session — savings stack forever, not just per reply.

# Mac / Linux / WSL / Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# Windows (PowerShell 5.1+)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex

Trigger: type /cavemanor say “talk like caveman”. Stop with normal mode. Statusline shows [CAVEMAN] ⛏ 12.4k lifetime tokens saved.

same client, different model bill

generaltool

free-claude-code

Local proxy that speaks the Anthropic Messages API to Claude Code and routes each request to a free or cheap provider — NVIDIA NIM, Kimi, Wafer, OpenRouter, DeepSeek, LM Studio, llama.cpp, or Ollama.

marketplace/free-claude-code →Alishahryar1/free-claude-code· ★ 24k

One-line install

curl --create-dirs -fsSL https://skillmake.xyz/i/free-claude-code -o ~/.claude/skills/free-claude-code/SKILL.md

Local proxy that speaks the Anthropic Messages API to Claude Code and translates each request to whichever provider you configure — NVIDIA NIM (free key), Kimi, Wafer, OpenRouter, DeepSeek, LM Studio, llama.cpp, or Ollama. Claude Code itself doesn't change. Streaming, tool use, reasoning blocks all work.

Per-tier routing means MODEL_OPUS, MODEL_SONNET, and MODEL_HAIKU can each point at a different backend — cheap defaults, premium fallback. Local Admin UI at /admin handles keys without an env-var dance.

# 1. install Claude Code (the real one)
npm install -g @anthropic-ai/claude-code

# 2. install uv + Python 3.14 (macOS / Linux)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.14

# 3. install the proxy
uv tool install --force git+https://github.com/Alishahryar1/free-claude-code.git

# 4. start it
fcc-server
#    → http://127.0.0.1:8082/admin (paste your NVIDIA NIM key)

# 5. launch Claude Code through the proxy
fcc-claude

Admin UI is loopback-only by design — don't tunnel it without auth in front. Discord / Telegram bot wrappers and Whisper voice transcription are optional extras.

parallel context-isolated work

concepttechnique

fan out subagents

Spawn multiple Task subagents in a single message. Each runs in its own context window, so the parent doesn't pay for every grep — and independent work actually runs concurrently.

The single biggest accuracy + cost win in long sessions: when work is independent, send one message with multiple Task tool blocks. Each subagent gets a fresh context, returns just a summary, and the parent context stays clean.

Anti-patterns this kills:

Sequential greps — three back-to-back grep calls in the parent burn three round-trips of full context. One fan-out = one round-trip plus three short summaries.
Long investigations in the parent — a 30-file walk inflates the parent transcript forever. The subagent reads 30 files, returns one paragraph; the parent never saw the noise.
One-after-another research — independent queries done serially wait on each other. Fan-out runs them at the same wall-clock minute.

# Mental model — single message, N tool calls, all run in parallel

Agent: [Task: search for hooks usage] [Task: search for context usage] [Task: search for ref usage]

# vs. the slow / context-hungry version:

Agent: [Task: search for hooks usage]
  → result
Agent: [Task: search for context usage]
  → result
Agent: [Task: search for ref usage]
  → result

Heuristic: if you'd open three terminal tabs to do it, fan it out. If the next step depends on the previous result, don't — that's sequential by nature.

write the goal, refer back, stop drifting

concepttechnique

/goal

Start the session by asking the agent to write down the explicit goal + success criteria + non-goals. Pin it. Refer back when scope creeps.

The agent will happily follow you into yak-shaving for an hour. The trick is to make it write the goal before doing any work — and then make it check itself against that goal at every meaningful branch point.

Concretely: at the top of a non-trivial session, ask for a 3-line statement of (a) what done looks like, (b) what we're intentionally not doing, and (c) the riskiest unknown. Save that to a scratch file or pin it in the prompt. When the agent proposes a fourth tangent, paste the goal back as a one-liner — drift disappears.

# Use at the top of a session

You: /goal — write our explicit goal, success criteria, and non-goals
     for this session before we start.

Agent: GOAL
         Ship the /powerhouse page with 4 skill blocks + videos.
       SUCCESS
         - /powerhouse renders, MP4s autoplay, marketplace links resolve
         - typecheck clean
         - deployed to skillmake.xyz
       NON-GOALS
         - reworking the global header (separate task)
         - publishing to hyperframes.dev cloud
       RISKIEST UNKNOWN
         - whether the existing build-and-publish-demos.mjs handles
           non-mp- seeds without code changes

Pair with fan out subagents — once the goal is locked, you can confidently send independent pieces of it to parallel subagents without losing the thread.

Stack the wins

The four compose cleanly. caveman cuts ~65% of output tokens at the model; free-claude-code changes who you're paying for those tokens at all; fan out subagents stops the parent context from inflating in the first place; and /goal keeps the whole stack pointed at the right target.

Got another trick?

If you've found a token-saver worth shipping — submit it. Same curation pipeline as the rest of the marketplace.