Claude Token Mastery: Beat Limits, Cut Costs 90%

Compounding Token Costs and Invisible Overhead Drain Sessions

Claude's 1M token context window starts with 8,000+ tokens of overhead from system prompts, conversation history, tools, files, and skills—often ballooning to 62,000 in fresh sessions. Every message forces Claude to reread the entire history, causing exponential growth: message 1 costs ~500 tokens, message 30 hits 15,500 (31x more), with one 100+ message chat wasting 98.5% of tokens on rereads. This "compounding, not adding" dynamic fills limits fast, especially since output tokens cost more than input, and unseen outputs (e.g., internal processing) amplify waste.

"One developer actually tracked a 100 plus message chat and found that 98.5% of all the tokens were just spent rereading the old chat history in the session. Like that's a huge waste." (Speaker highlights reread inefficiency, explaining why long sessions explode costs despite fixed per-message inputs.)

Check baseline with /context in a fresh session to spot bloat; exclude unneeded files via .claudeignore. Keep claude.md under 200 lines (~2,000 tokens) as it loads every session—offload specialized instructions to on-demand context files or skills.

Context Rot Degrades Performance, Worsens Efficiency

As sessions grow, "context rot" (AI dementia) spreads attention thin: retrieval accuracy drops from 92% at 256k tokens to 78% at 1M. Thinking depth falls 67% in long sessions (18k thinking blocks analyzed), edit-without-reading rises from 6% to 34%. Poor performance cascades into inefficiency—you burn extra tokens fixing vague, contradictory outputs. Auto-compaction at 95% window retains only 20-30% detail, executed at peak rot when Claude is "least intelligent."

"Retrieval accuracy drops from 92% at 256,000 tokens all the way down to 78% at a million tokens. So even if you can fill up your a million token context window, the model is going to be measurably worse."

(Speaker cites stats proving long contexts hurt quality, justifying proactive resets over maxing windows.)

One user slashed costs from $345/month to $42/month with flat output quality via better habits. Manual compaction at 60% (e.g., 250k/1M for Opus) preserves detail: prompt Claude for a full summary of progress, decisions, files, tasks, then /clear and paste it back. This mimics closing Chrome tabs but keeping bookmarks (plans, logs, sheets).

Rewind, Delegate, and Reset: Anthropic's Post-Response Options

After each Claude response, choose strategically over endless "continue":

/re (double-tap Escape): Jump to any prior message, drop the rest—Anthropic's #1 habit. Fixes failed attempts polluting context (e.g., broken code teaches via decision logs, not retention). Includes "summarize from here" handoff note.
/compact vs. manual: Skip built-in; custom summary + /clear at 120k tokens (12% window) reorients without loss.
Sub-agents: Delegate to fresh windows on cheap models (e.g., Haiku for summarization). "Spin up a sub-agent to review codebase"—like a research intern returning only results, avoiding main-session fluff.

"If you're packing for a trip... if you're frantically stuffing your bag... you're probably going to forget your charger... that's basically auto compaction at 95%." (Analogy shows why manual beats auto.)

Start in plan mode (e.g., Ultra Plan, Superpowers prompts) for upfront clarity, enabling one-shot implementations. Use /btw for side questions without history bloat.

Markdown Conversion and Monitoring Habits Triple Capacity

Convert inputs to markdown for massive savings: HTML 90% fewer tokens, PDF 65-70%, DOCX 33%—fit 3x content (40-page PDF = 130-page MD). Tools like Dockling handle it in seconds; skip for OCR/vision needs.

Monitor session limits constantly (desktop app view, second monitor). Near reset? Abuse with heavy tasks (agent teams, codebases). 50% left in 30min? Light workflows. Track via custom token dashboard (GitHub repo forthcoming): sessions, turns, input/output/cache by model/project/tool/prompt. Reveals patterns like 2M extra input from reorganizing a project; analyze high-token prompts/sessions.

Custom /session-handoff skill automates: At 224k tokens, outputs start/decisions/shipped, key files, state verification, open questions, "pick up from here." Copy, /clear, paste—fresh window, reoriented.

"Convert everything to markdown. Markdown is so much faster and so much cheaper... you can get roughly three times more content into the same context window."

(Speaker quantifies file-type efficiencies, prioritizing text extraction.)

Output brevity (e.g., "be concise") helps minimally since hidden outputs dominate; focus inputs.

Philosophy: Ditch 1M Windows for Sustainable Sessions

Long sessions make Claude "lazier and sloppier"—stats confirm. Philosophy: Reset often with external storage (task lists, logs) for clean contexts outperforming bloated ones. Custom skills/dashboard/repo in free school community; Anthropic article diagrams validate.

"The rule of thumb... if you're starting a new task do /clear and if you're continuing the same task do /compact. And honestly I kind of disagree... this one habit alone... has probably made the most noticeable difference."

(Speaker rejects docs, favors summary+clear for continuity without rot.)

Key Takeaways

Slash baseline /context in fresh sessions; trim to <8k overhead via .claudeignore, lean claude.md.
/re failed attempts early—clean context > retained errors; use handoff summaries.
Manual summary + /clear at 120-250k tokens; store plans/logs externally.
Delegate to Haiku sub-agents for cheap, isolated tasks.
Markdown all files (90% HTML savings); monitor limits, time heavy work pre-reset.
Build/track with token dashboard; plan mode first for one-shot execution.
Avoid 1M max—performance drops sharply; short, fresh sessions win.
Free resources: session-handoff skill, dashboard repo, Anthropic guide in school community.