Agents SDK Upgrades Harness, Sandbox, and Compute Separation

OpenAI's updated Agents SDK (v0.14.0+) adds model-native harness for file/tools work, native sandbox execution across providers like E2B/Modal, and harness-compute separation for secure, durable, scalable agents on long tasks.

Unlock Frontier Model Capabilities with Enhanced Harness

Build agents that handle documents, files, commands, and long-horizon tasks by using the Agents SDK's model-native harness, which aligns execution with how models like gpt-5.4 perform best. Install via pip install "openai-agents>=0.14.0" and create a SandboxAgent with instructions like "Answer using only files in data/. Cite source filenames." and a Manifest for workspace entries (e.g., LocalDir for data).

The harness integrates primitives like MCP for tool use, skills for progressive disclosure, AGENTS.md for custom instructions, shell for code execution, and apply-patch for file edits. This reduces custom infrastructure needs, improves reliability on complex/multi-step tasks, and supports configurable memory and sandbox-aware orchestration. For example, run Runner.run(agent, "Compare FY2025 revenue...", run_config=RunConfig(sandbox=SandboxRunConfig(client=UnixLocalSandboxClient()))) to analyze metrics.md safely, outputting cited comparisons like FY2025 revenue up 26% from FY2024's $98.7M.

Trade-offs of prior systems—model-agnostic frameworks underutilize models, provider SDKs lack harness visibility, managed APIs limit data access—are addressed, enabling production viability as seen in Oscar Health's clinical records workflow, where agents parse encounter boundaries in long documents for faster patient insights.

Secure Workspaces via Native Sandbox Support

Provide agents controlled environments for reading/writing files, installing dependencies, and running code without piecing together execution layers. Use Manifest to define portable workspaces: mount local files, output directories, and storage like AWS S3, GCS, Azure Blob, Cloudflare R2.

Built-in clients for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Vercel ensure consistency from prototype to production, giving models predictable inputs/outputs for organized long-running work. This out-of-the-box layer prevents brittle prototypes from failing in prod.

Scale and Secure with Harness-Compute Separation

Externalize agent state to protect credentials from prompt-injection/exfiltration in compute environments. Built-in snapshotting/rehydration resumes runs from checkpoints if sandboxes fail/expire, ensuring durability.

Route subagents to isolated containers, invoke compute only as needed, or parallelize across many for speed—ideal for coordinating diverse tools/systems. Pricing uses standard API tokens/tool calls; Python GA now, TypeScript, code mode, subagents coming soon. Future expansions add sandbox providers and integrations for ecosystem fit.

Summarized by x-ai/grok-4.1-fast via openrouter

7195 input / 1671 output tokens in 8541ms

© 2026 Edge