The Spectrum of UI Generation

As LLMs have evolved to produce high-fidelity, accessible frontend code, the industry is moving through three distinct phases of UI generation:

  • Static Components: The current standard. Agents act as orchestrators, passing data and props to pre-built React components. Examples include the AG UI protocol and Goose's auto-visualizer. While predictable, this approach limits the agent's ability to create bespoke interfaces.
  • Declarative UI: A middle ground where agents generate descriptors (JSON, YAML, or Python) that a rendering engine maps to a design system. This offers a balance of flexibility and consistency, similar to how Netflix uses server-driven UI for personalization. Tools like Vercel's JSON Render exemplify this approach.
  • Generative UI: The frontier where models write raw HTML, CSS, and JavaScript on demand. This bypasses static component libraries entirely, allowing agents to create highly imaginative, context-specific interfaces in a single tool call.

The Necessity of Containment and Delivery

Generative UI introduces significant security and trust challenges. Because LLM-generated code is inherently untrusted, it requires strict sandboxing. The Model Context Protocol (MCP) serves as the ideal delivery mechanism for this future because it provides built-in authentication, tool calling, and—crucially—a default double-iframe sandbox for third-party content. Even first-party features, such as Anthropic’s visualizer, are increasingly leveraging MCP to handle the complexities of delivering dynamic, agent-generated interactions.

Moving Toward Human-Agent Collaboration

We are currently in the "radio era" of AI interfaces—using new technology to replicate old patterns (like chat windows) because we lack the imagination to define the new medium. The next evolution will move beyond simple visualization toward true human-agent collaboration. The Excalidraw MCP app serves as a prototype for this shift: it provides a shared canvas where humans and agents can interact, modify, and co-create in real-time, rather than treating the UI as a one-way output from the agent to the user.