From Chat Primitives to Agentic Platforms

AI SDK 7 marks a transition from basic model interaction to a full-featured agent platform. The core focus is on "agentic" capabilities: reasoning control, durable execution, and observability. By centralizing reasoning configurations and introducing a WorkflowAgent, the SDK allows developers to build agents that persist state across steps, survive process restarts, and handle complex multi-turn tasks reliably.

Production-Grade Reliability and Observability

To move beyond local demos, the SDK introduces hardened primitives for production environments:

  • Tool Approvals: Developers can define granular approval policies (user-required, auto-approve, or delegated) with HMAC signing to prevent tampering with tool arguments.
  • Durable Execution: The WorkflowAgent persists state to storage, ensuring that long-running tasks remain intact during deployments or interruptions.
  • Observability: Telemetry is now global and structured, moving to a dedicated @ai-sdk/otel package. This allows for consistent tracking of performance metrics like time-to-first-token, tool execution time, and throughput across all agent steps.
  • Timeout Budgets: Developers can set specific timeout budgets for total execution, individual steps, chunks, or specific tools, providing better control over non-deterministic model behavior.

Unified Integration and Media Handling

AI SDK 7 standardizes how agents interact with the outside world:

  • Harness Layer: The new HarnessAgent allows developers to wrap existing agent runtimes (like Claude Code or Codex) within the standard AI SDK interface, enabling consistent streaming and tool usage across different agent ecosystems.
  • Media and MCP: The SDK now treats images, files, and video as canonical parts of the message stream. MCP (Model Context Protocol) support has been expanded to include sandboxed iframe rendering for app UIs, allowing agents to interact with and display custom tool interfaces directly.
  • Realtime & Multimodal: Experimental support for realtime WebSockets and video generation is now integrated, allowing for richer, multi-sensory agent interactions.