Building and Scaling Production AI Agents at OpenGov

Moving to a Custom Agent Loop

OpenGov initially utilized LangGraph but transitioned to a custom, Effect-TS native agent loop to gain full control over the architecture. By building on Effect, the team gained built-in structured concurrency, logging, and dependency injection, which allows them to hot-swap language models easily. This custom harness provides the agency required to handle complex, production-grade government workflows that generic frameworks struggled to support at scale.

Safety, Observability, and Context Management

Operating in the government sector requires high reliability and trust. OpenGov employs three primary strategies to manage this:

Human-in-the-Loop: For mutating operations, the agent loop deterministically interrupts execution to require explicit human approval via the UI.
Sandboxing: Agents execute code and file creation tasks within ephemeral, isolated sandboxes, ensuring no risk to production systems.
Observability: By leveraging Effect’s native tracing, the team gets granular visibility into function calls and bottlenecks out of the box. This allows for cross-service debugging, which is critical when agents interact with multiple internal APIs.
Long Context: To handle token limits and conversation bloat, the team uses a rolling summarization strategy. Rather than stuffing the entire history into the prompt, they maintain a running summary of the conversation, allowing the agent to recall past topics without overloading the context window.

Standardizing via Protocols and Tools

To maintain consistency across diverse product suites, OpenGov adopted the A2A (Agent-to-Agent) protocol. This provides a rigorous contract for agent definitions, ensuring that front-end and back-end components remain aligned.

Development velocity is further accelerated by treating AI capabilities as modular 'tools' and 'skills.' By registering these tools into toolkits, the team can dynamically expose functionality to the model. This modularity extends to the UI; the agent can register primitives that allow it to generate interactive forms or UI components on the fly, creating a personalized experience for the end user. Finally, the team uses automated evals in CI/CD pipelines to test prompts against real-world tool calls, ensuring that updates do not degrade accuracy.

Moving to a Custom Agent Loop

Safety, Observability, and Context Management

Standardizing via Protocols and Tools

More from AI & LLMs

MMX-CLI Unlocks Multimodal AI via Shell Commands

Agent Observability vs. Traditional Observability

Building Observability and Evaluation for AI Agents

Build Event-Sourced AI Agents with Stream Processors