AI Token Spend Surges 10x: Measure ROI Before Cutting

Token Spend Explodes Across Scales, Hitting $500/Day Per Dev

AI token usage jumped ~10x in six months at multiple firms, with no slowdown. Examples: seed-stage AI infra saw 15x rise from $200 to $3,000/developer/month; fintech devs hit $500/day on Claude Code, doubling effective employee costs; healthcare firm had one engineer spend $1,400 in a single Claude Code session. Large SaaS (10k+ people) raised API budgets multiple times in April after switching to high-effort Claude, spiking PR costs. Mid-sized (2k people) e-commerce saw insane usage without limits, mandating Opus 4.7 minimum to avoid prod errors. Series A (50 people) heavy users (15 devs) faced rapid rises on Claude/Claude Code. Productivity upsides: healthcare traffic grew 10x YoY without new hires, engineering now blocked on product/design; infra founder views $1-5k/month/dev as minor vs $200-400k/year comp, betting on local models long-term.

Large Firms Monitor Without Hard Limits, Prioritize Business Case

At 10k+ SaaS, internal coding tool defaults to cheaper Claude Sonnet (non-persisted), supports all frontiers without limits—heavy users thrive. Public infra (5k people) spots heaviest users but sees ROI, guides against high-effort Claude, allows bottom-up open-source trials. Fintech (8k) leadership flags unsustainable growth without action. IT director (10k+) notes unforecasted spikes from SOTA models on trivial tasks, predicts reckoning as finance notices $hundreds/day/highly-engaged dev. Games studio (5k) rations tightly—$200/month/dev too high for Claude Code. Fintech (5k) ties AI use to performance reviews, pushing max usage despite review bottlenecks.

Mid/Small Firms Split: Spend Freely + Measure vs Optimize Early

"Let it rip" half: SaaS (2k) routes models (default change cut 30%), spends short-term while monthly-tracking spend/outcomes—adjust if divergence. Healthcare (500) runs spend leaderboards, wants more usage for massive leverage. Series A principal eyes increasing budgets + measuring ROI/adoption first, delaying optimizations. Finance VP (2k) ditches $100/user caps (exhausted in 3-5 days), blocks priciest Cursor models, shifts to pooled spend; Claude limits rising for critical cases. Infra founder (700) self-policing caps high-end at ~$1k/week post early $10k/week caching fix, dismisses Ralph loops-style $1k/day folly as junk R&D. E-com (2k devs) buys discounted tokens (5%+ tiers), no limits under AI-pilled CEO. Bootstrapped switches Opus to Sonnet.

Two Strategies Emerge: Impact-First vs Cost Controls, Plus Discounts

Strategy #1 (half): Spend freely, measure usage/impact—positive for exploding startups avoiding hires. Avoids premature cuts before ROI clear. Strategy #2: Cheaper models for simple tasks, non-persisted cheap defaults, hard caps/consent. Rejected by #1 users as wrong optimization. Discounts: Cursor tiers from 5% at $1M+ spend; Anthropic none even at $5M+/year. Negotiate custom—free upside at scale. Future: Local models (Kimi/Qwen) for control, but hardware-heavy.