Building Multi-Agent Systems: When to Skip the LLM

The Hybrid Architecture: Determinism Meets Intelligence

Building a multi-agent system often leads to the trap of over-relying on LLMs for every decision. The "Race Condition" project demonstrates that the most robust systems are hybrid: they use LLMs for high-level reasoning and judgment, but rely on deterministic code for computationally expensive or mission-critical tasks. By offloading NP-hard problems (like marathon route planning) to established algorithms, the system gains reliability, unit-testability, and massive performance gains.

The `before_model_callback` Pattern

The core technical innovation in this system is the use of the before_model_callback feature within the Google Agent Development Kit (ADK). This allows developers to intercept the agent's lifecycle before the model is ever invoked. By injecting deterministic logic at this stage, the system can return tool calls or state updates immediately, bypassing the LLM entirely. This approach preserves the agent's lifecycle management, telemetry, and observability while eliminating the latency and cost of an LLM call.

Scaling to 1,000 Agents

To run 1,000 agents simultaneously, the system treats the simulator as a game server and the runners as independent entities. The runners use an "Autopilot" mode—a deterministic heuristic engine derived from research on runner behavior. Because these agents are stateless and run on Cloud Run, the system offloads session state to Redis. This architecture ensures that adding more runners does not increase token consumption, as the AI only performs the initial judgment, while the code handles the execution.

Key Takeaways

Optimize for the Right Tool: Use LLMs for judgment and reasoning; use deterministic algorithms for math, pathfinding, and repetitive procedural tasks.
Leverage Callbacks: Use interceptors like before_model_callback to bypass LLM calls for known, repeatable workflows while keeping the agent framework's telemetry and lifecycle benefits.
Design for Statelessness: When scaling to thousands of agents, keep processes stateless and use a shared session store (like Redis) to maintain state across distributed instances.
Unit Test Your Logic: Deterministic code is unit-testable, whereas LLM outputs are probabilistic. Move critical path logic into code to ensure reliability.
Use AI for Algorithm Selection: Even if the final implementation is deterministic, use LLMs (via AI Studio or CLI) to research, suggest, and review the best algorithms for your specific constraints.

Notable Quotes

"The real best answers are not to go with full large language models... the routing logic is fully deterministic."
"The model is required by the LLM agent in order to create the object but it's never actually called because the before_model_callback intercepts every invocation."
"It would be easy to write this where you short circuit like tool calling and your callback does all the work by itself but then you lose things like visibility and telemetry."
"The AI decides, the code runs."

The Hybrid Architecture: Determinism Meets Intelligence

The before_model_callback Pattern

Scaling to 1,000 Agents

Key Takeaways

Notable Quotes

More from AI & LLMs

Building Production-Grade Multi-Agent Systems with ADK

5 Essential Concepts for Modern AI Agent Architecture

Building AI Agents with Model Context Protocol (MCP)

Building Autonomous AI Agents with Google ADK

The `before_model_callback` Pattern