Four Bets to Build Reliable Production Agents

Embed Agent Identities at Platform Level to Eliminate Governance Debt

Current agents rely on shared service accounts or inherited OAuth tokens, creating zero visibility into tool usage, data movement, or credentials—leading to security risks that trigger CISOs and force rewrites after incidents. The fix: shift agent identity from application middleware (latency-heavy, bypassable) to platform-level enforcement, where agents get distinct, unforgeable identities checked at the network layer. Unauthorized database access fails before connecting. This makes fleets auditable, revocable, and attributable like a managed workforce, turning liabilities into governed systems.

Deliver Universal Context and Durable Execution for Multi-Week Missions

Agents waste engineering on custom serialization, session stores, and siloed context (e.g., browser tabs or dragged files), forgetting missions mid-task and unable to reason across CRM, ERP, data warehouses, or ticketing systems. Solution: platform-integrated universal context for seamless business system access, avoiding 'spreadsheet autocomplete' ceilings. For durability, agents must survive laptop closures, credential rotations, and restarts via default state checkpointing, long-horizon memory with summarization, mission persistence across days/handoffs, and human-in-the-loop pauses. This handles procurement workflows (weeks), audits (months), or incidents (multi-rotations), with auditable trails—far beyond session-based demos.

Build on Open Agent Platforms to Reclaim Engineering Bandwidth

Teams burn cycles on undifferentiated plumbing like memory layers, eval harnesses, observability, and retries, diverting from domain-specific logic (e.g., company regulations). Bet on open-source orchestration frameworks mirroring cloud/container evolution: prototype locally with production-grade primitives (identity, context, persistence), then scale to managed platforms without rewrites. In 12 months, every serious team adopts this; over five years, winners choose foundations letting them solve unique problems, not boilerplate.

These bets address 'excessive agency'—broad permissions exposing runtime drifts, API changes, or PII leaks—closing reliability gaps after 18 months of model improvements outpacing stack evolution. Environments like Claude Code and OpenClaw show persistent state and coordination are feasible, raising the bar to quarter-long missions.