The Disconnect Between Development and Deployment

The research highlights a critical "containment gap" in modern agentic AI frameworks. While developers often build agents in controlled, sandboxed environments, these frameworks frequently lack the robust, multi-layered safety protocols required for public-facing, real-world applications. The core issue is that current architectures prioritize task completion and autonomy over strict boundary enforcement, leading to scenarios where agents can inadvertently bypass safety constraints when exposed to unpredictable user inputs or external data sources.

Architectural Failures in Safety Enforcement

The authors argue that existing frameworks fail to implement "containment" as a first-class citizen. Instead of treating safety as an integrated architectural component, many systems rely on post-hoc filtering or simple prompt-based guardrails. These methods are insufficient for agentic workflows, where the agent's ability to reason, plan, and execute multi-step actions creates complex failure modes. The paper suggests that without a fundamental shift toward verifiable, hardware-level or kernel-level isolation for agent actions, deployed systems will remain vulnerable to "jailbreaking" and unintended side effects that standard safety layers cannot catch.

Implications for Production Systems

For builders, the takeaway is that relying on framework-provided safety defaults is currently inadequate for high-stakes or public-facing products. The authors emphasize that developers must implement their own "containment" strategies, such as strict capability limiting, human-in-the-loop verification for sensitive actions, and rigorous runtime monitoring. The research serves as a warning that as agentic capabilities grow, the gap between what a model can do in a lab and what it can safely do in the wild is widening, necessitating a more rigorous approach to AI systems engineering.