Scale Agents with Planners and Workers for Week-Long Coding

Separate planning and execution roles let hundreds of agents collaborate on massive projects, generating 1M+ lines of code over weeks while minimizing conflicts and drift.

Avoid Flat Coordination: Locks and Optimism Fail at Scale

Single agents excel at focused tasks but stall on complex, month-long projects needing human teams. Parallel agents without hierarchy lead to bottlenecks—20 agents drop to 2-3's throughput due to locking delays, forgotten releases, or failures mid-lock. Optimistic concurrency (free reads, failing writes on state changes) cuts brittleness but doesn't fix risk-aversion: flat agents dodge hard tasks, make tiny safe edits, and churn without end-to-end ownership, causing indefinite stalls.

Dynamic self-coordination via shared files amplifies these issues since project paths are ambiguous upfront; rigid upfront planning feels impractical. Instead, evolve to structured roles to enforce progress.

Use Recursive Planners and Focused Workers

Pipeline architecture divides labor: Planners scan codebases, generate tasks, and spawn sub-planners recursively for parallelism on subsystems. Workers claim one task, execute fully without peeking at others or the big picture, then push changes—handling conflicts autonomously.

End cycles with a judge deciding continuation; reset for fresh iterations to prevent tunnel vision. This scales to hundreds concurrent on one branch with minimal conflicts, even on 1M-line, 1,000-file codebases. Workers self-resolve merges, eliminating need for extra integrators that bottlenecked earlier.

Tested on: (1) Scratch web browser (~1 week, fastrender GitHub)—hard despite screenshot simplicity; (2) Cursor's Solid-to-React migration (3+ weeks, +266K/-193K edits, CI-passing); (3) 25x video rendering speedup in Rust with zoom/pan springs and motion blur (merged to prod). Trillions of tokens deployed prove viability for ambitious goals.

Prioritize Prompts, Role-Specific Models, and Simplicity

GPT-5.2 outperforms Opus 4.5 (quicker quits, shortcuts) and even GPT-5.1-Codex (coding-tuned) for long runs—better instruction-following, focus, anti-drift, precision. Match models to roles (e.g., GPT-5.2 planning > codex). Prompts dictate 80% of behavior: tune heavily for coordination, pathology avoidance, sustained focus—more impact than harness or models.

Simplify ruthlessly: Distributed systems/org designs often flop for agents; middle-ground structure curbs conflicts/duplication/drift without fragility. Ditch extras like integrators—workers suffice. Future: Event-driven planners, bounded runs, auto-resets for drift.

Summarized by x-ai/grok-4.1-fast via openrouter

7421 input / 1447 output tokens in 10461ms

© 2026 Edge