Fix Claude Code Limits with Token Optimizations

Decode Claude Limits to Plan Usage

Claude's Pro ($20/mo) provides ~45 messages every 5 hours starting from your first message across all devices/interfaces; Max gives 225, Max 20x plan 900. Numbers drop with Opus (3x more tokens than Sonnet) or compute-heavy tasks like tools/multi-step reasoning. Peak hours accelerate depletion, and idle time still burns the window. Truncated error responses and injected skill listings bloat context without value, as retries append partial junk instead of discarding it.

Plan upfront to avoid costly corrections: initial token spend on alignment prevents 10x waste from rewrites. This shifts usage from reactive fixes to efficient execution, sustaining Pro plan workflows all day.

Slash Context Bloat in Active Sessions

Reset with /clear after tasks (e.g., post-implementation before testing) to drop history, preventing each message from resending full conversation/system prompts/tools. For partial retention, /compact summarizes interactions to reclaim space without full loss.

Offload side questions via /btw for isolated responses outside main context, avoiding unrelated bloat. Undo misalignments with /rewind (or double-ESC) to revert to pre-error state, skipping bad outputs/token sends entirely.

These commands counter growing context (every reply includes all prior history), keeping requests lean and hitting 45+ effective messages on Pro by minimizing per-turn overhead.

Structure Projects to Load Only Essentials

Keep claude.md <300 lines as a high-level guide: include dev practices Claude ignores by default (e.g., 'don't do X'), omit redundant basics like standard dev server commands or file architecture deductions from names. Avoid init-generated bloat listing obvious filesystem navigation.

Link separate docs for specifics (e.g., DB schema) enabling progressive loading—Claude pulls only relevant files, not everything per session. Use path-specific rules, skills for repetitive flows (progressive load), and bundled scripts for deterministic tasks to bypass AI token use.

Hooks filter junk: e.g., script test outputs to inject only failed cases, excluding passed ones. Append one-off instructions via system prompt flag (temporary, session-end removal) over permanent claude.md inclusion, as it avoids perpetual token drag.

Result: focused context sustains Pro limits where token-heavy frameworks (BEMAD/Spec Kit) fail, loading unrelated info only when needed.

Tune Configs and Models for Low-Token Mode

Match model to task: Haiku for simple, Sonnet for moderate (saves vs Opus's 3x cost), reserving Opus for complex reasoning. Set effort to low (vs auto/high) for non-thinking tasks, saving on internal compute.

Disable thinking entirely for direct generation (distinct from effort: no reasoning step at all). Turn off auto memory (stops background habit-tracking/consolidation), background tasks (dream/memory refactor/indexing), and unused MCPs (prevents injected irrelevance).

Enable prompt caching (disable_prompt_caching=false) to skip billing repeated prefixes. Cap max output tokens to curb verbose replies. These halt idle/background drains, extending windows even during peaks.