Master Claude Tokens: Avoid Session Limits Forever
Tokens compound exponentially as Claude rereads full history each message—rewind with /re, manual summaries before /clear, sub-agents, and markdown conversions keep sessions lean and performant under 1M window.
Token Compounding Drives Exponential Costs
Claude charges for every token reread from the conversation start on each new message, causing costs to grow non-linearly. A single message might cost 500 tokens, but by message 30, it's 15,500 due to full history reread—98.5% of tokens in a 100+ message chat are wasted on old history. Startup overhead alone burns 8,000-62,000 tokens from system prompts, files, tools, and skills before any input. > "Every time that you send a message, Claude rereads the entire conversation from the beginning. And all of those are tokens that it's charging you for."
Check baseline with /context in a fresh session to spot bloat. Output tokens cost more than input, but gains from forcing concise responses are minimal since hidden outputs (tools, caches) dominate.
Context Rot Degrades Performance—Act Early
As context fills, "context rot" (AI dementia) spreads attention thin, causing contradictions, forgotten details, vague outputs, and unreadable file edits. Retrieval accuracy falls from 92% at 256k tokens to 78% at 1M, inflating effective token needs—500k tokens for what 200k could do fresh. Auto-compaction at 95% window retains only 20-30% detail at peak rot, like frantic packing forgetting essentials.
Manual intervention at 60% (e.g., 250k-600k tokens) preserves quality: Prompt Claude for a full summary of progress, status, key files, decisions, open questions, then /clear and paste it back. Store artifacts externally (task lists, decision logs, sheets) so resets feel seamless—like closing Chrome tabs with bookmarks intact. > "Retrieval accuracy drops from 92% at 256,000 tokens all the way down to 78% at a million tokens."
1M window is insurance, not a target—skip filling it to maintain sharp performance.
Rewind, Sub-Agents, and Custom Handoffs Reset Cleanly
After each Claude response, choose strategically over endless "continue":
- /re (rewind): Anthropic's top habit—double-tap Escape or /re to jump to any prior message, dropping failures afterward. Failed attempts pollute context; rewind cleans for future accuracy. Use "summarize from here" for handoff notes: "Here's what we figured out. Do it this way."
- Avoid /compact: Loses fidelity; instead, custom "session handoff" skill analyzes full history, outputs structured pickup (start point, decisions shipped/deferred, key files, open questions, next task). Copy, /clear, paste—reorients instantly at 224k tokens example.
- Sub-agents: Delegate to fresh windows for research/summaries (e.g., "Spin up sub-agent on Haiku to review codebase"). Returns synthesized output only, like a research intern—no main-session bloat. Cheaper models match Opus quality for grunt work.
Rule tweak: /clear for new tasks or continuations with handoff; feels continuous via external logs. Skill and guide free in community.
Markdown Discipline and Planning Minimize Input Bloat
Convert inputs to markdown for 33-90% token savings: HTML (90%), PDF (65-70%), DOCX (33%)—tokenizer ignores layout noise. Tools like Dockling process in seconds; 40-page PDF fits like 130-page markdown. Text-only; use vision/OCR sparingly.
Start in plan mode (Boris Churny-style): Spend upfront tokens clarifying via Ultra Plan/Superpowers prompts for one-shot implementations—no corrections. Keep claw.md <200 lines (~2k tokens) as it loads every session; route specialized instructions to on-demand context files/skills. Use .claudeignore for repo exclusions.
Side questions via /btw overlay—answers without history pollution. Monitor session limit visibly (desktop app, second monitor); abuse nearing reset (agent teams, heavy code), pause low (walk/snack).
Track Usage to Reverse-Engineer Savings
Custom token dashboard (public repo forthcoming) breaks down sessions/turns by input/output/cache read/create across models/projects/tools/prompts. Reveals imbalances, e.g., 2M extra input from mass reads. Past 7/30 days views inform habits.
"One developer actually tracked a 100 plus message chat and found that 98.5% of all the tokens were just spent rereading the old chat history in the session. Like that's a huge waste."
10 frameworks for token-saving (detailed in free resource guide).
Key Takeaways
- Baseline fresh /context: Trim startup bloat >8k tokens immediately.
- Rewind failures with /re + summarize handoff after every response.
- At 10-60% window (120k-600k), prompt custom handoff summary, /clear, repaste—store logs externally.
- Delegate grunt/research to cheap sub-agents (Haiku); get outputs only.
- Convert all to markdown (Dockling); plan mode first for one-shots.
- .claw.md <200 lines; .claudeignore repos; /btw sides.
- Watch limit live—abuse pre-reset, pause low.
- Dashboard tokens by prompt/project to spot leaks.
- Manual at 60% beats auto at 95%; 1M is backup, not goal.
- Free skill/dashboard/guide in community for instant setup.