VS Code's Agent Loop: Tools, Sub-Agents, and Hidden Optimizations
VS Code Copilot's agent loop runs as a dynamic while loop with model-tuned prompts, auto-context, tools, and sub-agents using cheaper models for tasks like retrieval—boosting code success from 52% to 90% via relentless optimization.
The Agent Loop: A Continuous While Loop Powered by Tools and Context
Brian breaks down the agent loop as a giant while loop that kicks off when you hit enter on a prompt. Each iteration sends an API request to the model with four core components: a dynamically built system prompt, explicit and implicit context, available tools, and your user prompt. The loop continues as the model observes previous outputs, decides on the next action—text response or tool call—and iterates until it issues a stop message.
"Imagine you just basically have a giant while loop... every there's many many interactions with the model," Brian explains. System prompts aren't static; they're optimized per model family through pre-launch tuning with providers like Anthropic and OpenAI, plus post-launch A/B tests and evaluations. Responsible AI safety prompts are universal, but the rest adapts: "There is no one prompt for Copilot... it's dynamically built and optimized specifically for that model."
Context is key. Explicit mentions like "hello.tsx" get included, alongside implicit signals: open editors, running terminals, environment info, dates. Tools form the loop's foundation—built-ins for search, file reads/edits, plus extensions like NCP servers. The model picks a tool via schema (description + parameters), VS Code executes it, and feeds results back. James notes the explosion of options: bypass, autopilot, planning mode, custom agents, reasoning levels. "The set of tools and options have grown," he observes, leading to visible research phases like grepping files for button placements.
Trade-offs abound. More tools or context fill the window, degrading choice quality—like humans with too many options. "Just like a human, when you give people more choices, their ability to pick the right choice degrades," Brian warns. VS Code counters with unseen optimizations: custom models refine tool lists or handle agentic code retrieval, ensuring edits land correctly.
Sub-Agents: Delegation as a Tool Call with Model Routing
Sub-agents emerge when the main agent needs specialized work. They're not magic; the main agent treats them as a tool. It fills parameters (goal, fresh context), VS Code spins up a sub-loop, and results return like a function output. "A sub-agent is basically like this main agent can decide, 'I want to go basically do this workflow, run this agent loop again with fresh context,'" Brian says.
Users spot branches: main loop on Opus 4.6, sub-agents on Haiku or Mini. No bait-and-switch—it's deliberate routing. Retrieval or planning benefits from fast, cheap models; synthesis needs heavyweights. "We're all of our incentives... to build the best possible experience... we will not pull fast one on you," Brian assures. Instructions append as text (global or glob-patterned), skills as optional tool-like reads. NCP adds tools dynamically.
James recounts Twitter confusion: "I see a bunch of sub-agents exploring... but it's using a different model... 'Are you guys pulling a fast one?'" Brian ties it to primitives: model decides via prompts, which can explicitly push sub-agents. Corrections append as text, letting the model adapt—or derail, hence kill-and-restart advice.
Harness Optimizations: From 52% to 90% Code Success
The "harness"—prompts, context, tools, custom models—makes VS Code's agents shine versus CLI or others. Brian's team (15-20 people) obsesses over trajectories: not just resolution, but optimal paths in fewer steps. "With Opus 4.6, I think we're getting 90% of Opus 4.6 code in our harness committed... GPT-4.1, when I first started... 52, 53%. This is the improvement we see in 1 year."
They built VS SWE-bench, a cleaner SWE-bench alternative avoiding training data pollution. Pre-launch: weeks/months of access, multi-runs to cut variance, trajectory analysis. Post-launch: A/B tests capture real-world wins. Demand spikes (10 agents/session) strain capacity; new models like Opus 4.7 start raw, improve fast. "Today is like the worst day to use that model because it's a brand new model... it's an infant state."
Micro-optimizations abound: chat naming via cheap LLM, commit/PR generation as mini-loops, next edits. Even titles: "We're passing the conversation history to a cheap model... to get a title back very quickly." Custom models tackle hard sub-problems like context gathering.
"There's an enormous amount of work that goes in not just to partnering with our model friends, but optimizing those prompts... so that we give you the best results," Brian emphasizes. Continuous loops: shipped model tweaks, new model queues, generic tool refinements.
User Control and When to Intervene
Prompting matters: explicit sub-agent requests or corrections build history. But loops can loop badly—prior tokens predict next, so bad paths compound. Kill early: "That's why it's important to kill it, back up and understand why do you think it's going down this path."
Brian stresses foundations for advanced use: instructions append text, skills add context-on-demand, tools expand options judiciously. Over-prompting or tool-bloating hurts; trust the harness but steer explicitly.
Key Takeaways
- Start with basics: Agent loop = while loop of prompt (system + context + tools + user) → model decision → execute → repeat until stop.
- Use explicit prompts for sub-agents or corrections; they append as text, letting the model adapt.
- Don't fear model switches in sub-agents—cheap models excel at retrieval/planning, saving cost and time.
- Kill looping agents early; bad paths compound via token prediction.
- Expect new models to improve fast—wait a week post-launch for tuned prompts and capacity.
- Limit tools/context to avoid choice paralysis; VS Code's custom refiners help behind scenes.
- Monitor trajectories in chat outputs to debug: search → read → edit patterns signal healthy loops.
- Leverage implicit context (open files, terminals) for better relevance without extra prompting.
- Trust harness differences: VS Code's 90% success beats raw models via optimized paths.
- Experiment with modes (planning, autopilot) but ground in loop understanding for custom agents.