18 Hacks to 5x Claude Code Token Usage
Claude rereads full history per message, causing 98.5% token waste in long chats—start fresh convos, batch prompts, compact at 60% context, and use cheap models for sub-tasks to double-triple usage.
Token Mechanics Drive Exponential Waste
Claude charges tokens for rereading the entire conversation history on every message, causing costs to compound exponentially: message 1 costs ~500 tokens, message 30 hits 15,500 (31x more), and a 100+ message chat wastes 98.5% of tokens on old history. Bloated context from auto-loaded cloud.md, MCP servers (up to 18k tokens/server per message), system prompts, skills, and files degrades output via 'loss in the middle'—models ignore mid-context. Command outputs and 5-minute cache timeouts on breaks trigger full reprocessing, spiking usage. Visibility fixes like /context (shows token breakdown), /cost (session spend), and terminal status lines (model, progress bar, % of 1M window) reveal invisible overhead, e.g., 51k tokens pre-chat from prompts/tools.
Basic Habits Slash Per-Message Costs
Start fresh chats with /clear between unrelated tasks—each message in a long chat costs exponentially more than in a new one, extending session life most. Batch multi-step prompts into one message (e.g., summarize + extract + fix) to avoid 3x costs; edit/regenerate bad outputs instead of follow-ups that stack history. Use plan mode first ('95% confidence before changes; ask questions') to avoid wrong-path scrapes, the biggest waste. Disconnect unused MCP servers (prefer CLIs like Google Workspace for speed/cheaper); paste only essential code snippets, not full docs/files. Watch Claude work live to stop loops/rereads early, saving thousands on zero-value tokens. Keep dashboard open (or automate alerts) for pacing.
Advanced Routing and Model Choices Maximize Efficiency
Keep lean cloud.md (<200 lines) as an index pointing to files/skills/docs—auto-read per message, so bloat like 1k lines costs every 'hi'. Be surgical: '@filename verifyUser in auth.js' vs. full repo dumps. Compact manually at 60% capacity (/compact with preserve instructions) before auto-95% degradation; after 3-4, summarize/clear. Evolving cloud.md stores architecture rules, decisions, and one-line learnings (<15 words) for repeated tasks, plus rules like 'use Haiku sub-agents for 3+ files/research'. Pick models wisely: Sonnet default coding, Haiku sub-tasks/formatting (80% cheap tokens saves money), Opus <20% for planning. Sub-agents cost 7-10x (full context reloads); limit to one-offs. Schedule heavy work off-peak (afternoons/evenings/weekends vs. 8am-2pm ET weekdays); burn remaining allocation pre-reset, pause near limits to preserve flow. Hitting limits signals power usage—optimize hygiene, not just upgrade plans.