12 Rules to Halve Claude Code Context Usage

Optimize Core Files to Minimize Baseline Context

Trim your CLAUDE.md file ruthlessly: bloated versions at 910 lines consume 45% context for project analysis, while a 33-line version drops to 41%, saving 4% per interaction. Add a rule like "When context exceeds 50%, suggest new conversations or sub-agents to reduce it," so Claude proactively flags bloat at 75% usage and proposes fixes, preventing manual compaction.

Break monolithic workflows into granular skills (e.g., one for LinkedIn posts, emails, proposals, or CSV analysis). Skills load only relevant context—analyzing a bank CSV with a dedicated skill uses 27% vs. 45% when dumping questions into a generic CLAUDE.md. Create reference files for reusables like tone or banned phrases; prompt Claude to "reference if needed, skip otherwise." Baking them in bloats skills to 457 lines (31% usage); referencing slims to 31 lines (25% usage).

For large files like 3,001-line transcripts, attach as filesystem references, not chat messages: pasting consumes 71%, referencing drops to 38%—nearly halving it. Switch models strategically: Haiku burns 33% on a simple "Hey"; Opus uses just 9%, freeing headroom for complex tasks.

Audit and Reset Conversation History

Run /context anytime to break down usage: it lists tokens/percentages by category (e.g., MCP tools, memory, skills), revealing culprits even in basic chats. Hit /clear or new tab to reset fully when at 2-3% capacity. For salvageable history, use /compact: it summarizes long threads into a tiny prompt (specify keepers like key decisions), restarting fresh without losing essence—ideal at 90-100% bloat.

Purge Persistent Overhead and Offload Tasks

Query "check all my memories" to list Claude's stored facts (e.g., 17 personal/workflow items); delete irrelevants like demo projects ("delete everything about Hierarchy") to trim hidden drag. Run "Claude MCP list" then visit claude.ai/settings/connectors to revoke unused integrations—3 connectors like Slack/Airtable already eat substantial tokens; 20-40 would explode it.

For massive tasks (e.g., junk folders with large/binary files), spawn sub-agents: prompt "use sub-agents to extract questions, action items, decisions separately." This silos work—each handles 33%, avoiding overload in the main thread. Claude defaults to this for big projects, but explicit requests ensure it, distributing context across threads for reliable outputs.