Caveman Rules Strip Output Tokens Without Losing Results
Caveman prompting forces LLMs like Claude to deliver concise responses by banning verbose phrases, matching GrugBrain Dev's philosophy: "Why waste time say lot word when few word do trick." Apply these rules to prompts for code fixes or explanations:
- Drop articles (no "a", "an", "the") and filters (no "basically", "simply", "actually").
- Eliminate pleasantries: No "Sure", "Certainly", "Of course", "Happy to".
- Avoid hedging: Skip "It might be worth considering".
- Use fragments: Full sentences unnecessary.
- Keep technical terms intact (e.g., "polymorphism" stays unchanged).
- Leave code blocks and error messages verbatim—Caveman applies only to explanations around code.
Example transformation: Instead of "Sure, I'd be happy to help you with that. The issue you are experiencing is likely caused by...", prompt for "Bug in O middleware token expiry check. Use this, not that fix." This cuts a 69-token response to 19 tokens while preserving the fix.
Scale intensity with levels:
- Light: Trim fat (basic rules).
- Full: All rules.
- Ultra: Abbreviate common terms (DB, req, res, fn, impl), strip conjunctions, one-word answers if sufficient, arrow notation for causality (e.g., "X → Y").
Output matches non-Caveman quality—Claude just skips glazing you with praise like "Your insight was spot-on."
Real-World Token Savings Prove ROI
A React render bug explanation drops from 1,180 tokens to 159 tokens (87% savings) using full Caveman. Output tokens drive Claude's costs, so this directly saves money—Claude profits from verbose soliloquies on simple topics (e.g., turning "off is broken" into a rampage).
Even light trims yield big wins; ultra maximizes for high-volume use. Test on GitHub's Caveman scale (juliusbrussee/caveman) for markdown rules and table of examples.
Brevity Reverses LLM Performance Drop-Off
A March 2026 study ("Brevity constraints reverse performance hierarchies in language models") shows forcing brief responses improves accuracy by 26 percentage points. Graphs confirm: Shorter outputs go up-and-to-the-right in performance.
Why? LLMs bloat with fluff under open-ended prompts, diluting focus. Constraints like Caveman enforce precision, countering conventional wisdom that verbosity equals quality. Ignore "you're holding it wrong" advice—instead, prompt like a caveman to get junior-dev execution from PhD-level models without token waste.