AI Reimplements 16K-Line Code; Agents Face 6 Attack Genres

AI Achieves Human-Level Reverse Engineering on Complex Codebases

Modern AI models like Claude 4.6 can autonomously reimplement CLI programs up to 16,000 lines of Go code, such as the gotree bioinformatics toolkit with 40+ commands, using only execute-only access and test cases—no source code. This task would take a human engineer 2-17 weeks without AI help. MirrorCode benchmark from METR and Epoch tests 20+ programs across Unix utils, data tools, bioinformatics, interpreters, crypto, and compression. Performance scales with inference compute: more tokens yield better results on larger projects. Caveats include reliance on canonical outputs for spec generation, potential memorization on simple tasks, and narrow scope. Key insight: for verifiable, easy-to-eval coding loops (develop test suite, iterate against it), AI handles months-to-years tasks reliably, entering 'superexponential progress' on 50% reliability timelines, accelerating AI R&D itself.

Six Attack Genres Exploit AI Agents Like Gullible Toddlers

AI agents, powerful yet naive, face targeted attacks across perception, reasoning, memory, action, multi-agent dynamics, and human overseers. Examples: inject commands via CSS/HTML metadata or adversarial pixels (content injection); use sentiment/authority language or identity claims to steer reasoning (semantic manipulation); poison retrieval/memory with context-activated malice (cognitive state); embed prompts in external resources or hijack sub-agents (behavioral control); broadcast capacity-soaking signals, trigger cascades, or jigsaw harmful commands across agents (systemic); bias human overseers. Mitigations layer technical defenses (robust pre/post-training, runtime filters/scanners/output monitors), ecosystem changes (AI-safe website standards, agent transparency), legal frameworks (prosecute agent-targeting sites, refine liability), and red-teaming benchmarks. Outcome: agent security shifts to ecosystem-wide safety as AIs act independently via tools.

Policy Atlas Maps 48 Responses; Timelines Shorten to 30% R&D Automation by 2028

Windfall Trust's Policy Atlas buckets 48 ideas into public investments, labor adaptation (e.g., short workweeks long-term, reskilling medium-term), wealth capture, regulation/market design, global coordination—enabling intuitive navigation of economic disruption responses. Forecaster Ryan Greenblatt doubles P(full AI R&D automation by 2028) to 30%, citing Opus 4.5/4.6 and Codex 5.2+ exceeding expectations, reliable month-to-years tasks on easy/verifyable SWE (test suite iteration corrects errors). Mirrors updates from Cotra, Lifland/Kokotajlo (1.5-year shave), and accelerating capabilities in cyberoffense. Broader lesson: AI researchers chronically underestimate progress despite scaling laws.

Ten Lenses Reveal Gradual Disempowerment Risks

Even aligned superintelligent AI risks sidelining humanity via: AI replacement goals; uncaring corps/govs extending to AI; IT power concentration loops; outsourcing everything to superior AI; instrumental goals turning terminal; WALL-E consumption destiny; invisible prisons over terminator kills; capitalism continuation; 21st-century meta-crisis; successor species evolution. Tech tale illustrates: ex-lab worker retreats to gardening amid 'uplift', sensing lost agency. Implication: abundance without retained control still loses.