Caveman Plugin Barely Cuts Tokens in Claude Code Tasks

Caveman claims 65-75% token cuts by shortening AI responses, but real-world Claude Code tests show identical 4% token usage for code implementation tasks—thinking and code gen dominate costs, not communication.

Token Savings Hype Doesn't Hold for Code Generation

Caveman is a Claude Code plugin that shortens AI responses to primitives like comma-separated lists (e.g., "Plan enum service form request") instead of full sentences, claiming 65% token cuts per its README and 75% less in a viral Claude AI Reddit post. Examples show single phrases shrinking dramatically, which works for chatty interactions. However, in production-like code tasks, it delivers no measurable savings because most tokens (high-effort thinking with Opus at 4.7 effort) go to internal reasoning and code output, not terminal communication. Reddit users echo this: "It's not prompts that cost money, it's thinking" and "optimizes the cheapest part of the bill."

To benchmark yourself, start a fresh Claude Code session on Anthropic's $100 plan, note baseline usage (e.g., 13%), run a task like implementing a project from a description.md (3-4 minutes for API creation), then recheck (e.g., 17%, or 4% delta). Repeat in a new folder with Caveman installed via a simple slash command—no config needed. Results match: same 4% delta to 21%, despite shorter plan steps and status updates like "fix tests."

Core Costs Lie in Thinking and Code, Not Chat

Claude Code sessions for substantive work (e.g., full API from spec, passing test suites) use tokens primarily for:

  • High-effort internal planning (majority).
  • Code generation and iteration.
  • Minimal terminal output, which Caveman targets.

Communication is sparse—short plans, "Done live," green test passes—so even 75% cuts there yield negligible impact. Hype from 40,000 GitHub stars and social media overlooks this: invoke /caveman manually when chatting iteratively (e.g., discussing implementations), not for autonomous code tasks. Trade-off: ultra-concise output risks clarity loss in complex plans, though tests passed identically.

Use Sparingly for Chat-Heavy Workflows

Caveman shines in discussion-heavy sessions (e.g., back-and-forth on approaches), potentially hitting 30% savings as some Reddit reports claim. For code gen, skip it—save the slash command for when verbosity bloats chats. Test your own repos: duplicate folders, same prompts, compare session % usage. Bottom line: another hype-buster; no miracles for Opus thinking modes.

Summarized by x-ai/grok-4.1-fast via openrouter

4784 input / 1364 output tokens in 8591ms

© 2026 Edge