Building Long-Running AI Agents with Google's Agentic Stack

The Shift from Chatbots to Long-Running Agents

Traditional AI agent demos are often stateless, existing only within a single context window that eventually fills up and discards information. Addy Osmani, Director of Google Cloud AI, argues that production-grade agents must function as "long-running" systems—entities that persist, sleep, and self-correct over hours, days, or weeks. The unit of work for these agents is not a prompt, but a complete, end-to-end workflow.

The Three Pillars of Production-Grade Agents

To move beyond "vibe coding" and build reliable systems, Osmani identifies three non-negotiable requirements:

Durable State Checkpoints: State must be persisted on every transition. If a container crashes, the agent must be able to resume exactly where it left off without hallucinating memory or repeating intermediate steps.
Event-Driven Dormancy: Agents must be able to "sleep." This avoids the inefficiency of active polling or blocked threads. Agents should remain dormant until an external event—such as a webhook, scheduled task, or human approval—wakes them up.
Separated Evaluation: Agents should not grade their own work, as they tend to be overconfident in their output. A robust setup requires a three-agent architecture: a Planner to define the path, a Generator to execute the task, and an Evaluator to test the results against defined success criteria.

Practical Implementations

Osmani demonstrates these concepts through three distinct use cases:

HR Onboarding Coordinator: An agent that manages a multi-day onboarding process. It sends welcome packets, waits for human signatures, delegates IT provisioning to sub-agents, and confirms hardware delivery before generating a day-one schedule. The UI only updates once the backend state machine confirms the completion of each step.
Autonomous OS Construction: Using a goal-oriented primitive, an agent built a functional operating system featuring a window manager, file explorer, terminal, and even a playable version of Doom. This demonstrates how agents can handle complex, multi-step coding tasks by treating each commit as a save point.
3D Environment Generation: An agent-driven workflow in Blender that generated a complex, nostalgic 3D video store. This highlights the ability of agents to handle creative, multi-day rendering and design tasks that would be impossible in a single prompt.

Overcoming Developer Challenges

Building these systems requires moving away from monolithic loops. Osmani emphasizes the importance of "Agent Harness Engineering," where memory patterns are treated as structured data (like markdown logs) rather than raw JSON chat history. By utilizing the Gemini Enterprise agent platform and the Agent Development Kit (ADK) 2.0, developers can manage infrastructure that supports long-term memory and state-machine-driven workflows, ensuring the agent stays on track without drifting or breaking.

The Shift from Chatbots to Long-Running Agents

The Three Pillars of Production-Grade Agents

Practical Implementations

Overcoming Developer Challenges

More from AI & LLMs

Agentic Abstention: Improving When LLM Agents Should Stop

How the Model Context Protocol (MCP) Standardizes AI Integration

Architecting Long-Running AI Agents for Multi-Day Workflows

Decoupling Search from Reasoning in LLM Agents