Tokenmaxxing Leaderboards Drive AI Waste

Leaderboards Incentivize Wasteful Token Burning

Tokenmaxxing turns AI usage into a status symbol, leading to enormous waste. At Meta, an internal leaderboard ranked 85,000 employees' token consumption, crowning top users "Session Immortal" or "Token Legend." In 30 days, Meta burned 60.2 trillion tokens—equivalent to $900M at Anthropic's API rates, likely $100M+ at discount. Engineers reported massive waste from OpenClaw-like agents producing no outcomes, AI-generated code causing SEVs (severe outages), and top leaderboard users creating throwaway work visible in Trajectories (AI prompts log). Microsoft tracks token usage, AI-written vs. hand-written code percentages, pressuring even new engineers to inflate metrics: querying documented code via AI (10x slower), prototyping unneeded features then discarding, or defaulting to slow agents. Salesforce sets minimums ($100/month Claude Code, $70 Cursor) via Mac widgets and peer-comparison tools, with easily bypassed max limits ($250 Claude, $170 Cursor). Engineers burn tokens on irrelevant projects or calibrate spend just above peers' averages to avoid flags.

These incentives prioritize token volume over value, mirroring past lines-of-code metrics that rewarded boilerplate over problem-solving. High token use signals "AI-nativity" for reviews but slows work and bloats bills without business impact.

Meta and Microsoft Scrap or Evolve Amid Backlash

Meta shut down its leaderboard after The Information's report sparked social media backlash, confirming waste incentives. One long-tenured engineer speculated the true goal: generate real-world traces for training Meta's next coding model, as leaderboards guaranteed massive usage data despite high costs. Microsoft started positively, with distinguished engineers and VPs topping charts despite low prior coding, promoting experimentation. But it devolved into fear-driven tokenmaxxing to avoid seeming under-AI-committed.

Shopify's Safeguards Prevent Abuse

Shopify's early 2024 token leaderboard succeeded by evolving into a "usage dashboard" on internal wikis, avoiding competition. Key protections: circuit breakers halt runaway agents or daily spikes (revealing infra bugs), and manual reviews of top spenders ($1,000+/month on Cursor) probe use cases like agent workforces, catching tokenmaxxing. Focus on costliest tokens (not total spend) highlights deep work. Early on, it pushed AI adoption when tools were experimental; now it balances encouragement with controls, celebrating productive power users without waste.

Broader Lesson: Measure Outcomes, Not Inputs

Tokenmaxxing echoes lines-of-code pitfalls—gameable, uncorrelated with impact. Companies waste millions on busywork while best developers solve problems efficiently, with or without AI. Rational alternatives: track outcomes (shipped features, bugs fixed) plus safeguards like Shopify's, not raw consumption.