Code Mode: AI Agents Generate Executable JS Over JSON Tools
Replace JSON tool calling with AI-generated JavaScript code execution in sandboxes to handle massive APIs (e.g., Cloudflare's 2600 endpoints, 1.2M tokens reduced to 1K), enable stateful loops/parallelism, and unlock emergent behaviors like inspecting canvas strokes for tic-tac-toe.
Scale AI Agents with Code Generation, Not JSON Tool Calls
Traditional tool calling fails at scale: stuffing hundreds of tools (e.g., Google services, Jira, wiki) into context causes breakage, poor composition, and slow back-and-forth rounds. For Cloudflare's 2,600 API endpoints, exposing each as a tool requires 1.2 million tokens in the first call—impossible. Instead, use "Code Mode": prompt LLMs to output executable JavaScript that runs in one shot against an environment. Benefits include typed APIs with syntax/type checking (leveraging models' training on terabytes of code), state holding, looping, sequencing, and parallelization—capabilities JSON lacks.
Expose minimal tools like search (searches full OpenAPI JSON spec) and execute (runs code against discovered endpoints). This shrinks 1.2M tokens to ~1,000 (99.9% reduction). Example prompt: "We're getting DDoS'd—find and block offending IPs." Model generates JS to search APIs, paginate workers, identify attackers, and block them in one execution, avoiding 8 round trips. Live demo on Mythical server lists Workers via read-only access, searching endpoints and paginating results.
Emergent Behaviors from Inhabiting System State
Code Mode lets agents "inhabit the state machine" rather than generate separate programs. In Kenton’s (Cloudflare Workers creator) canvas (TLDraw/Excalidraw-style), user draws tic-tac-toe board. Agent inspects raw stroke array (grid lines + X), recognizes game state, and draws O in center—no tic-tac-toe code exists. Reasoning traces show Opus let user win, hinting at alignment quirks. This breaks programmer/non-programmer divide: anyone prompts "rename 200 photos by date/location," agent generates/runs script using vision models—bypassing clunky $7/month apps.
Build Harnesses: Capability-Based Sandboxes for Safe Execution
Core architecture is the "Harness": fast-starting sandbox (e.g., V8 isolates for 10 years of security hardening, quick spin-up; alternatives: WASM, custom JS interpreters) with zero initial capabilities (no fetches, APIs). Explicitly grant APIs as functions, control all outgoing network (prefer no fetches for speed/observability). Run ephemerally near APIs/user (e.g., iPhone for mashing services) or backend.
Enables long-running workflows (days/months/years with persistent state), generative UIs tailored per user (e.g., e-commerce: custom return shoes/find similar <$100 or track delayed order interfaces from user context/cart). Design for agent DX: Markdown docs, searchable errors, discoverability. Revives capability-based security: start empty, add powers explicitly. UI shifts—React devs excel as they're user-proximate; rethink 30-year tech tree now with safe eval.
New Era: Code as Universal Interface for Agents
Programmers always used code for power; others got buttons/forms. LLMs democratize this—agents are next billion users, dreaming in types/syntax. Build systems exposing code-friendly APIs (not just human UIs). Harness runs general-purpose computing cheaply (no $400 Mac Mini for API calls). Outcomes: custom per-user programs, stitched services, full observability (trace why agent traded $2.3M on "llama poop").