Caveman Plugin Saves Few Tokens in Code Tasks
Caveman shortens Claude's verbose output by 65-75%, but code implementation benchmarks show identical 4% token usage per task since thinking (Opus high effort) and code gen dominate costs.
Token Savings Limited by Usage Patterns
Caveman is a Claude Code plugin that compresses responses into terse, comma-separated phrases like "Plan enum service form request" instead of full sentences, claiming 65% token cuts per its README and 75% in a Claude AI Reddit post example. In a benchmark implementing a project from project-description.md (API creation with tests, 3-4 minutes), non-Caveman usage rose 4% (13% to 17% on $100 Anthropic plan), matching Caveman's 4% increase (to 21%). No net savings occurred because communication text is a minor fraction of tokens—most burn happens during internal thinking (Opus 4.7 high effort) and code generation, not terminal output.
Reddit discussions confirm this: users report 30% reductions at best, but comments note "it's not the prompts that cost the money, it's the thinking" and it "optimizes the cheapest part of the bill." Hype from 40,000 GitHub stars and social media overlooks that code sessions involve few back-and-forth phrases.
Invoke Manually for Chat-Intensive Work
Install via simple command in Claude Code—no config needed. Prefix prompts with "/caveman" only when expecting verbose discussion, like iterating implementations or chatting alternatives, where repeated shortenings compound savings. Avoid default use in autonomous code tasks, as it adds negligible value and slightly slows completion (4 vs 3 minutes). Test your workflows: if communication exceeds 20-30% of tokens, expect measurable cuts; otherwise, it's hype.