Recreate Parallel Coding with Markdown Skills and Sub-Agents

Git WorkTrees enable isolated parallel checkouts for agents to work on tasks without interfering, allowing grids of agents or model competitions (Best Agent) to compare outputs like frontend changes before merging via PRs. Cursor's original implementation spanned 15,000 lines of code handling tree creation, isolation, setup scripts, judging, system reminders, and cleanup for disk bloat from hundreds of trees.

Replace this with two primitives: agent skills (instruction sets) and sub-agents. The /worktree command (a server-controlled skill prompt) instructs the agent to: create a WorkTree via git (git worktree add), run user-configured setup scripts, operate only inside it (cross-platform: Windows/Linux/macOS paths), and avoid escaping via aggressive reminders like "NEVER work outside this directory." The entire skill is ~200 lines of Markdown.

For Best Agent (/bestagent), a 40-line skill spawns sub-agents per model (e.g., Claude, Grok, Composer, GPT, Opus), each in its own WorkTree. The parent agent waits, then grades outputs in a table, critiques differences (e.g., "These two did the same; Opus added X"), and lets users mix changes (e.g., "Combine Opus UI with GPT logic"). Commands like /apply-worktree merge changes; /delete-worktree cleans up.

This trusts the LLM for isolation (vibes-based vs. hard enforcement) but delivers near-identical UX: isolated edits, PRs, visual diffs.

Gains: Lower Maintenance, Broader Compatibility

Delete 15,000 LoC for an advanced feature used by power users only, freeing engineering time. Users switch to WorkTrees mid-chat via slash command (impossible before due to UI clutter). Multi-repo setups now work seamlessly—agent creates trees per repo, opens multiple PRs. Best Agent judging improves: parent has full sub-agent context for stitching diffs, unlike prior single-model lock-in.

Perceived speed matches native (no actual slowdown), and maintenance iterates via server-side prompts without app updates.

Tradeoffs and Fixes: Reliability via Evals and RL

Cons: Models drift over long sessions (e.g., Haiku often escapes to primary checkout; Composer/Grok better). Feels slower watching tree creation in-chat. Discoverability drops—no dropdown; requires knowing /worktree.

Mitigate with evals using Braintrust and headless Cursor CLI: score if work happened in WorkTree (good) vs. primary (bad). Patterns inform prompt tweaks and system reminders. Add WorkTree tasks to RL pipeline for Composer 3+ (none in Composer 2's thousands of tasks). Share feedback with labs.

Future: Native WorkTrees in Cursor 3.0's agentic UI (chat-optimized, no editor); evals/RL for skills; git-independent primitives (faster, less disk, non-git repos). Mixed forum feedback reflects habit change, but power-user focus prioritizes leanness.