Caveman Rules Strip Output Tokens Without Losing Results

Caveman prompting forces LLMs like Claude to deliver concise responses by banning verbose phrases, matching GrugBrain Dev's philosophy: "Why waste time say lot word when few word do trick." Apply these rules to prompts for code fixes or explanations:

  • Drop articles (no "a", "an", "the") and filters (no "basically", "simply", "actually").
  • Eliminate pleasantries: No "Sure", "Certainly", "Of course", "Happy to".
  • Avoid hedging: Skip "It might be worth considering".
  • Use fragments: Full sentences unnecessary.
  • Keep technical terms intact (e.g., "polymorphism" stays unchanged).
  • Leave code blocks and error messages verbatim—Caveman applies only to explanations around code.

Example transformation: Instead of "Sure, I'd be happy to help you with that. The issue you are experiencing is likely caused by...", prompt for "Bug in O middleware token expiry check. Use this, not that fix." This cuts a 69-token response to 19 tokens while preserving the fix.

Scale intensity with levels:

  • Light: Trim fat (basic rules).
  • Full: All rules.
  • Ultra: Abbreviate common terms (DB, req, res, fn, impl), strip conjunctions, one-word answers if sufficient, arrow notation for causality (e.g., "X → Y").

Output matches non-Caveman quality—Claude just skips glazing you with praise like "Your insight was spot-on."

Real-World Token Savings Prove ROI

A React render bug explanation drops from 1,180 tokens to 159 tokens (87% savings) using full Caveman. Output tokens drive Claude's costs, so this directly saves money—Claude profits from verbose soliloquies on simple topics (e.g., turning "off is broken" into a rampage).

Even light trims yield big wins; ultra maximizes for high-volume use. Test on GitHub's Caveman scale (juliusbrussee/caveman) for markdown rules and table of examples.

Brevity Reverses LLM Performance Drop-Off

A March 2026 study ("Brevity constraints reverse performance hierarchies in language models") shows forcing brief responses improves accuracy by 26 percentage points. Graphs confirm: Shorter outputs go up-and-to-the-right in performance.

Why? LLMs bloat with fluff under open-ended prompts, diluting focus. Constraints like Caveman enforce precision, countering conventional wisdom that verbosity equals quality. Ignore "you're holding it wrong" advice—instead, prompt like a caveman to get junior-dev execution from PhD-level models without token waste.