Token Maxing: Big Tech's AI Metric Madness

Token Maxing Emerges as a Perverse Incentive

Gergely Orosz describes 'token maxing' as engineers artificially inflating AI token usage to climb internal leaderboards or meet spending targets at big tech firms. At Meta, a now-removed leaderboard sparked panic, with engineers querying agents to summarize docs they could read directly, just to boost counts. "Instead of reading the documentation I will ask the agent to summarize it for me and ask questions even though it doesn't do a good job answering it but my token count goes up," one engineer shared. Microsoft sees similar antics, like running autonomous agents to generate junk. Salesforce enforces a $175 monthly minimum spend, prompting end-of-month token binges.

This mirrors historical developer productivity blunders, like optimizing for lines of code or PR velocity via tools like Pluralsight Flow. Orosz notes it's weaponized in perf evals: low token use flags 'low performers not even trying,' while high usage crowns 'innovators.' Layoff fears amid cuts at Block amplify paranoia, even if uncorrelated. Even post-leaderboard removal at Meta, token maxing persists among risk-averse engineers in high-paying roles.

What started as playful experimentation has curdled into cultural weirdness, driven by leadership equating token spend with AI adoption. Orosz heard from CTOs six months ago whose teams shunned early AI tools like Cursor on legacy codebases, prompting mandates. Coinbase CEO Brian Armstrong fired an engineer after a one-week AI usage ultimatum, sending a clear message.

AI's Net Productivity Despite Flaws

Despite abuses, Orosz affirms AI boosts individual output, though team-level gains lag. He recounts a Dutch CTO dinner where engineers resisted pre-o1 models, but leaders pushed adoption fearing competitive lag behind Anthropic, where Claude writes much code amid surging revenue.

Goodhart's Law looms—measured metrics distort—but tracking nudges usage. Orosz compares it to LeetCode interviews: big tech selects for 'smart people willing to put up with bullshit,' now extending to token grinding. Startups ignore it, focusing on shipping; big tech's scale demands it.

Productivity puzzles persist. A small 'Meter study' (30 devs) showed self-perceived 20% gains but actual 20% drops, with one outlier. Orosz highlights Simon Willison's insight: "AI is just so hard to get good at. There's no manual." Unlike compilers, theory (attention mechanisms) doesn't shortcut practice; it demands constant workflow iteration. Teams thrive with 'low ego, open to learning, leave your priors behind.' Non-technical collaborators gain most via coding agents, creating 'serverless developers' who bypass engineer bottlenecks.

Engineer Roles Expand into Orchestrators

AI accelerates pre-existing shifts: tester and DevOps roles collapsed into engineering years ago in VC-funded startups; now product duties fold in. Even John Deere's 200-year-old teams shrink from two-pizza to one-pizza sizes. Early-career engineers face senior expectations—business awareness, planning.

The 'everyone's an engineering manager' trope irks Orosz: agents skip people drama, unlike managing careers or conflicts. "You've become more removed from the product and you have to deal with people problems," he says of true management. Instead, it's tech lead work: orchestrating agents like a 'mech suit' (per DHH), enabling parallel tasks with fast feedback. Michael Hashim runs just two agents; others parallelize more. No universal pattern yet.

Big Tech's Internal AI Infra Frenzy

Amid sparse customer-facing AI launches (e.g., Uber), companies rebuild infra: monorepo-integrated coding agents, MCP gateways in service discovery, AI-risk code reviews, on-call tooling. Uber, Airbnb, Intercom, Meta lead; mid-size firms follow.

Orosz sees value: low-risk AI hands-on practice, context-aware RAGs beating vendor limits on massive codebases, easy funding ('agent experience' trumps plain dev platforms). Shopify pioneered, snagging GitHub Copilot pre-launch for 3,000 engineers, trading churn for a six-month edge. Big tech's moats justify it—if executed well, startups beware.

Key Takeaways

Track AI usage company-wide to boost adoption, but avoid leaderboards or hard targets that spawn token maxing.
Prioritize individual practice: AI mastery takes time, no manual—iterate workflows relentlessly.
Empower non-engineers with agents to unlock 'serverless developers,' amplifying team velocity beyond solo coder gains.
Evolve into agent orchestrators, not managers: focus on tech lead skills for faster feedback loops.
Build custom infra early if scaled: integrate MCP gateways and RAGs for monorepos to outpace off-the-shelf tools.
In startups, ignore token metrics—ship value; big tech's incentives select for grinders who innovate anyway.
Stay open-minded: 'Leave your priors behind' for low-ego teams extracting max AI value.
Trade short-term expense/churn for competitive leads, like Shopify's Copilot bet.

Notable quotes:

Gergely Orosz: "Low performer with low impact and a low token count clearly not even trying."
Via engineer anecdote: "People just want to not be in the bottom 25% or bottom 50% for token count."
Simon Willison (quoted by Orosz): "AI is just so hard to get good at. I've been doing it for two years and I'm still figuring out what works."
Gergely Orosz: "It's more like a mech suit where you can do seven things at once."
Gergely Orosz: "Understanding the theory will not make you better at using the tools which is an absolute mindfuck."