Optimizing Multi-Agent Systems for Production

Architecting for Performance and Scale

When building multi-agent systems, the complexity of the architecture often introduces latency that makes real-time applications difficult. To optimize for production, developers should limit the number of agents to only what is necessary and focus on efficient communication protocols. In the case of the marathon planner demo, the team moved from a complex multi-agent setup to a streamlined three-agent architecture: a planner, a simulator, and an evaluator.

To handle high-concurrency scenarios (such as running 1,000 sessions simultaneously), the team utilized the Agent2Agent (A2A) protocol over Pub/Sub and WebSockets. By using Protocol Buffers (protos) for data serialization, they significantly reduced payload size and improved transmission speed compared to standard HTTP calls, allowing for real-time updates to the frontend.

The Role of Evaluation and Governance

Evaluation is not a one-time task but a continuous requirement in agentic systems. The team implemented a "judge" pattern where a separate agent (using Gemini 3.1 Pro) evaluates the output of the primary model (Gemini 3 Flash). This separation prevents the injection of model-specific biases and ensures higher output quality.

Key techniques for robust evaluation include:

Consolidated Checks: Instead of making multiple API calls for different criteria, combine non-deterministic checks into a single model call to reduce latency.
Real-time Monitoring: Use managed services (like the Gemini Enterprise Agent Platform) to track custom metrics over time.
Event-Driven Validation: Use ADK (Agent Development Kit) plugins to hook into the agent lifecycle, allowing for real-time validation of tool outputs and Agent-to-User Interface (A2UI) payloads.

Skills vs. Tools

Distinguishing between "skills" and "tools" is critical for maintainability. A skill represents a capability (like generating a specific UI component) that is bundled with instructions and prompts, while a tool is a functional mechanism for performing actions or validating data. By defining a standard catalog of A2UI components, agents can be instructed to emit JSON payloads that are validated by a dedicated tool before being rendered, ensuring the frontend remains stable and responsive.

Architecting for Performance and Scale

The Role of Evaluation and Governance

Skills vs. Tools

More from AI & LLMs

Google's Gemini 3.5 Flash: Agentic Performance at Scale

Claude's Agentic OS Chains Skills into Full Workflows

7 Signs to Switch Browser AI to Desktop Agents

Free Claude Code Proxy: 80-90% Quality at 2-5% Cost