AI Automates 12% of Tasks in White-Collar Jobs, 44% Needs Judgment
PASF PADE benchmark maps jobs to four automation zones: avg white-collar role is 12% Zone I (easy AI), 44% Zone III (judgment-heavy, hard for AI). Execs assistants 55% automatable; software engineers 83% safe in Zone III. Focus shifts to job purpose over tasks.
PASF PADE: Mapping Enterprise Work to Automation Zones
Marco van Hurne, running an 'agentification factory' at a big tech company and at Eigenvector, created the PASF PADE benchmark to measure AI's practical automation ceiling. Facing uneven results in enterprise-scale AI deployments—some processes automate smoothly, others explode with exceptions—he classified all knowledge work into four zones based on structure, judgment needs, and risk.
Zone I (easy, 27% of processes): Highly routinized tasks like data entry or basic transactions. Current AI agents handle these reliably with scripts or simple prompts, yielding quick wins.
Zone II (moderate): Semi-structured workflows needing coordination, e.g., customer service decision trees or IT ticketing. Requires workflow smarts; agents manage most if architected right. Zones I+II form the 35% 'current ceiling' for reliable automation.
Zone III (hard, 30-50% of knowledge work): Context-dependent judgment like financial analysis amid market shifts or ambiguous software requirements. AI approximates but fails unpredictably—'Russian Roulette' where one error wipes out wins. This zone houses expensive humans, driving massive economic incentives.
Zone IV (human-only): Accountability tasks like board decisions or ethical calls, demanding a 'human pulse' for liability.
Decision chain: Vendor demos ignored real variances, so van Hurne rejected whiteboard theory for empirical benchmarking. Tradeoffs: Zone I/II gains are low-hanging but volume-limited; Zone III's riches come with compliance nightmares. He details this in his prior post, “The Real Story Behind Enterprise Scale Process Agentification.”
"Think of current generative AI as a revolver with one bullet. Every time you run a process, the hammer cocks. Five times out of six, it fires clean and everybody celebrates. But the sixth time, when the chamber is loaded, that single failure erases all five wins. You are playing Russian Roulette with your company." (Van Hurne's Zone III analogy, highlighting why AI can't yet scale there without governance.)
Job-Level Analysis: Task Breakdown Reveals Limited AI Reach
Building on PASF PADE, van Hurne dove deeper over weekends, using job frameworks to decompose standardized white-collar roles into tasks, then map to zones. Result: A predictive tool showing automation potential per job (paused due to €50/day token costs at ai-automations.my).
Key results across 10 roles:
- Average: 12% Zone I, >44% Zone III.
- Executive assistants: 55% automatable (Zones I/II).
- Software engineers: 83% Zone III (safe, except juniors).
- Legal advisors: 100% Zones III/IV (fully human).
Before: Task-focused roles with routine heavy lifting. After: AI strips routines (e.g., juniors' market analysis), shifting humans to purpose/orchestration. Juniors/entry-level hit hardest, per ILO (2.3% full jobs lost) and MIT task papers. No full-job wipeout yet, but cumulative FTE savings reshape teams.
Why this method? Process tools existed, but jobs needed granular task translation for accurate %s. Rejected vague estimates for structured frameworks. Tradeoffs: High token burn for analysis; ignores blue-collar. Enables 'job apocalypse calculator' for enterprises.
"AI at its current state does not displace full jobs. It displaces tasks of a job instead." (Van Hurne's core thesis, backed by ILO/MIT, explaining why augmentation > replacement for now.)
Eigenvector's Zone III Assault: Boring, Governed AI
Van Hurne's research at Eigenvector targets Zone III's 30-50% gap via applied engineering, not frontier models. Problem: Clever agents hallucinate in context-heavy work. Solution: Goal-Directed Governance Agent—constrained, monitored, escalation-heavy.
Architecture: Goal + rules + limits; escalates unknowns. Pilots promising in controls; emphasizes 'boring stability' (predictability, tools, reasoning) over smarts. Tradeoffs: Less flashy than demos, but audit-ready for high-stakes (like aviation systems).
Neuro-symbolic endgame: Neural flexibility + symbolic rules for judgment; self-optimizes within guardrails (chess-tested). Rejected general self-improvement for bounded learning.
"The AI systems that actually matter in high-stakes environments are never the flashy ones. They are the dull, reliable, obsessively-monitored systems that do one thing correctly over and over again while generating audit trails that would make a regulatory lawyer weep with joy." (On 'Boring AI' for Zone III, contrasting demo culture.)
Tokenomics: Optimizing the Hidden Cost of Scale
AI isn't free—tokens compound at enterprise scale. After a year of 'burning money,' van Hurne built Token Minimization Governance: Architects agents for low-token paths (e.g., 500 vs 2000/task).
Zone-specific: Zone I (high-volume efficiency), Zone III (spend more for verification). Integrates with PASF for tradeoff decisions. Upcoming: Swarm simulations with Olivier Rikken (Zero-Human-Company).
Tradeoffs: Cheaper ≠ always better; Zone III errors cost more than tokens.
Adaptation Imperative: From Tasks to Purpose
No job vanishes, but routines do—humans become 'babysitters' shifting to value/purpose (credit: Fatih Boyla). Entry-level vulnerable; seniors thrive in judgment. Future: Zone III breakthroughs buy time, but don't idle.
"In my view, people should adapt by focusing on the purpose of the role, not its tasks." (Van Hurne's advice post-analysis, urging proactive reskilling.)
Key Takeaways
- Classify processes/jobs via PASF PADE zones to find real AI wins (start Zone I/II for 35% gains).
- Decompose roles into tasks for precise automation %—e.g., target exec admin routines first.
- Build 'boring' governed agents for Zone III: goals + escalations > autonomy.
- Optimize tokenomics early: Model spend by zone/volume to avoid CFO revolt.
- Reskill for purpose: AI handles tasks, humans own judgment/accountability.
- Pilot neuro-symbolic for self-optimization within rails—test on chess-like domains.
- Juniors/entry-level at risk; invest in augmentation over fear.
- Economic driver: Zone III's expensive humans make it priority #1.