The Shift to Governance by Construction

The paper argues that current approaches to AI safety—which rely heavily on post-hoc monitoring, guardrails, and external oversight—are insufficient for the increasing autonomy of generalist agents. Instead, it proposes 'Governance by Construction,' a methodology that treats safety as a foundational architectural requirement rather than an add-on. By embedding governance mechanisms directly into the agent's construction, developers can ensure that agents operate within defined boundaries by design, reducing the risk of emergent, unsafe behaviors in complex environments.

Architectural Constraints and Execution Environments

The core of this approach involves moving beyond prompt-based restrictions. The authors advocate for:

  • Restricted Execution Environments: Running agents within sandboxed environments where capabilities (e.g., file system access, network calls, API interactions) are strictly mediated by a governance layer that enforces policy at the system level.
  • Formal Verification of Agent Logic: Applying software engineering principles to agent workflows, where critical decision paths are verified against safety specifications before deployment.
  • Immutable Policy Enforcement: Ensuring that the governance rules governing an agent's actions cannot be overridden by the agent itself, even if it is prompted or manipulated to do so. This creates a clear separation between the agent's 'reasoning' (the LLM) and its 'execution' (the governed environment).

Bridging AI and Software Engineering

By framing AI safety as a software engineering problem, the paper emphasizes that reliability in generalist agents requires the same rigor as mission-critical software. This includes:

  • Declarative Governance: Defining safety policies as code or configuration that is version-controlled and auditable.
  • Observability as Governance: Using telemetry not just for debugging, but as a real-time feedback loop for the governance layer to throttle or halt agents that deviate from expected behavioral patterns.

This paradigm shift suggests that the future of safe AI lies not in making models 'smarter' or more 'aligned' in a vacuum, but in building robust, constrained systems that treat the LLM as a component within a larger, governed architecture.