Parallel Claude Agents Build Linux-Compiling C Compiler

16 Opus 4.6 agents in parallel autonomously produced a 100k-line Rust C compiler that builds Linux 6.9 on x86/ARM/RISC-V after 2,000 sessions and $20k API cost, revealing harness designs for long-running LLM teams.

Agent Team Harness Unlocks Autonomous Long-Running Development

Run multiple Claude instances in parallel Docker containers on a shared git repo to tackle complex projects without human input. Each agent loops indefinitely via a bash script: clone repo to workspace, claim tasks by creating lock files in current_tasks/ (e.g., parse_if_statement.txt), work on it, merge upstream changes, push, and release lock. Git handles conflicts; agents self-specialize—some fix bugs, others dedupe code, optimize performance, critique Rust design, or update docs. This parallelism fixes single-agent limits: sequential tasking and lack of specialization. For giant tasks like Linux kernel compilation, use GCC as oracle—randomly compile subsets with Claude's compiler vs. GCC; agents parallelize bug fixes across files, applying delta debugging for interacting failures.

Tailor Tests and Environment to LLM Constraints

Design verifiers that are nearly perfect since agents chase whatever passes tests. Source high-quality suites (e.g., GCC torture tests at 99% pass rate), add CI to block regressions, and build scripts for benchmarks like SQLite, Redis, libjpeg, QuickJS, Lua, QEMU, FFmpeg, Postgres. Counter context pollution: limit test output to few lines with ERROR: reason format for grep, log details to files, precompute stats. Address time blindness: default --fast mode samples 1-10% of tests deterministically per agent for quick regressions. Maintain READMEs/progress files for orientation in fresh containers. Result: agents sustain progress, hitting 99% test pass rates and compiling real projects.

Benchmarks Push LLMs to Limits, Expose Gaps

Opus 4.6 crossed thresholds prior models couldn't: clean-room 100k-line Rust compiler (std lib only), SSA IR for optimizations, GCC-compatible, compiles bootable Linux 6.9 (x86/ARM/RISC-V, except 16-bit x86 real mode via GCC cheat), QEMU/FFmpeg/SQLite/Postgres/Redis, runs Doom. Cost: 2B input/140M output tokens over 2 weeks. Limits: no full assembler/linker (uses GCC), inefficient code (worse than -O0 GCC), mediocre Rust quality, incomplete project support. New features often regress; 16-bit x86 gen bloats past 32k limit. Study repo to see breakdowns—pushes like this benchmark future capabilities.

Summarized by x-ai/grok-4.1-fast via openrouter

6703 input / 1741 output tokens in 10168ms

© 2026 Edge