AI Productivity Paradox: Wrong Metrics Hide Gains
High AI adoption hasn't spiked productivity stats due to time lags, outdated measurements, shallow workflows, and AI sometimes slowing workers—redesign systems to unlock real value.
The Apparent Disconnect: Surging Adoption, Stagnant Stats
AI use is exploding—McKinsey reports 88% of organizations apply it in at least one function, with Bain noting 40% of software dev pilots scaling to production (vs. 32% in customer service). U.S. AI investments hit $109B, adoption up 340% recently. Yet productivity growth hovers at 2.3%, matching the 2.2% historical average. No macro acceleration appears, despite hype. Marco van Hurne calls this the 'AI Productivity Paradox': inputs and excitement surge, outputs stay flat. Reason? Adoption ≠ transformation. Pilots deploy but workflows, roles, data, and metrics remain unchanged, yielding dashboards and costs without gains.
"Adoption is not transformation. 'We use AI' often means 'someone opened ChatGPT twice, created a project and renamed it ‘knowledge management’'"—van Hurne highlights how superficial use inflates stats while real change lags, trapping firms in pilots.
J-Curve Time Lag: Upfront Investments Drag Before Payoff
General-purpose tech like AI follows a J-curve (per Brynjolfsson, Rock, Syverson): initial dips from 'complementary capital'—organizational redesign, training, data prep, R&D—before gains. Productivity drops short-term as firms invest in intangibles treated as costs. MIT/U.S. Census study: manufacturing AI adopters saw drops, gains only after 4+ years.
Key buckets:
- Workflows/roles: Redesign decision rights; failure: unchanged chaos amplified.
- Skills: Train/hire; track via completion rates.
- Data: Clean/govern; avoid 'confident garbage'.
- Experimentation: Structured loops, not one-offs.
"AI doesn’t create productivity, systems do, and AI only amplifies whatever system you already have, whether that system is a ‘well-run operation’ or a ‘chaos with lots of meetings’"—van Hurne stresses AI as accessory; build org/people/data/learning around it, or get costs without ROI.
Early signals: redesigned roles cut cycle times 20-40%; poor data spikes errors 2-3x. Without this, CFOs see 'investment hangover'.
Measurement Breakdown: Task-Level Wins Lost in Aggregates
GDP tools, built for physical goods, miss AI's intangible, task-level impact. Issues:
- No AI bucket: BEA notes AI hides in 'software publishing/IT services'; proposes satellite accounts.
- Job vs. task: Stats track jobs/industries; AI hits tasks (e.g., faster drafts but longer reviews). 'Project Iceberg': visible job layer hides task automation.
- Intangibles undervalued: WIPO/Deloitte: intangibles (datasets, training) surge but expensed as costs, not assets—short-term drag despite long-term value.
Task gains absorb into systems: 10min saved drafting → 20min verifying + coordination = net loss. National stats understate as AI embeds in broad categories.
"We’re trying to track a high-tech, intangible economy using frameworks built for factories and physical capital. No wonder the stats look unimpressed."—van Hurne critiques 'meat thermometer on a cloud', urging task/end-to-end outcome tracking.
Workflow Redesign Failure: Pilots Die at Integration
Most bolt AI onto broken processes: faster outputs create downstream friction (e.g., escalations, debugging). MIT Sloan: 'work-backward'—deconstruct tasks, assign AI/human/AI+human, rebuild end-to-end, measure outcomes (time/quality/cost/risk).
Pilot funnel collapses at integration: ideas → pilots (wide), then data cleanup/compliance/change management kills most; scaling tiny. Production demands clean data, monitoring, ownership—pilot 'feels faster' won't cut it.
"It is easier to change the way the organization works, than to change the underlying technology."—van Hurne flips ERP wisdom: tool-forward pilots = 'graveyard'; redesign yields 30-50% cycle drops, quality rises.
Perception Trap: AI Can Slow Experts, Users Overconfident
METR RCT: frontier AI (Claude) slowed experienced devs via verification overhead (fixing output > time saved), quality mismatch (ignores codebase norms), context limits (naive suggestions in large repos). Users feel faster but deliver slower.
Mechanisms: over-reliance skips thinking; coordination rises. Negative productivity hides in 'confident garbage'.
"Giving developers access to frontier AI tools made them slower at completing tasks."—van Hurne cites METR, warning complex work backfires without redesign.
Key Takeaways
- Track complementary capital early: monitor role changes, training uptake, data quality, experiment velocity.
- Measure task/end-to-end: ignore job aggregates; log time/quality pre/post-AI per workflow.
- Work backward: task-decompose jobs, reassign AI/human, rebuild flows before pilots.
- Demand production rigor: clean data, guardrails, monitoring—not demo vibes.
- Watch for backfire: RCT-test AI in real tasks; verify net speed, not gut feel.
- Build intangibles as assets: capitalize training/datasets for true ROI view.
- Redesign first: AI amplifies systems—fix chaos or amplify it.
- Use satellite metrics: task logs, cycle times over GDP proxies.
- Iterate structured: kill 'one pilot, one funeral'; loop learnings.
- Align incentives: tie bonuses to outcomes, not tool installs.