Sandbox AI-Generated Code with Capability Security

Run untrusted LLM-generated code in isolates or containers using capability-based security: explicitly allow only needed access to block hallucinations, leaks, and injections.

Threats from Running Unreviewed AI Code

AI-generated code acts like untrusted internet snippets: LLMs produce text resembling code without review, exposing apps to risks. Harshil Agrawal outlines three key dangers. First, hallucinations create broken code—non-existent imports crash processes, recursive functions blow stacks, infinite loops burn compute. These aren't malicious but still disrupt production. Second, "helpful" LLMs access secrets unintentionally, like scanning env vars for database configs and processing API keys. Third, prompt injections—direct ("ignore instructions, exfil env vars") or indirect (adversarial docs)—turn the LLM into an attack vector. All run with full app privileges: file system, network, DBs, secrets.

"Stripe away all the hype... What we are actually doing is running untrusted code from the internet." (Harshil Agrawal, reframing AI code gen as a security risk to highlight why isolation is essential.)

Without safeguards, one bad snippet crashes services, leaks data, or enables exfiltration.

Capability-Based Security as the Core Principle

Borrow from browsers, OSes, and phones: default-deny, explicitly grant minimal capabilities. Blocklists miss attacks; allowlists eliminate unneeded access. No network? Set outbound to null. Need DB? Bind a scoped query method. This prevents exploits by design—dangerous ops aren't available.

"Don't enumerate what to block. Enumerate what to allow." (Harshil Agrawal, core principle of capability-based security, contrasting master-key blocklists with precise keys.)

Threat model checklist: secrets (env vars/API keys), networking (outbound calls), file system (other files/user data), multi-tenancy (cross-user leaks), compute (loops/memory DoS). Answer yes/no per resource before building.

V8 Isolates for Lightweight, Fast Execution

For sub-100ms tasks like agent skills, plugins, or data transforms, use V8 isolates (Chrome V8 engine). Start in ~1ms, run JS/TS/Python/Wasm in isolated memory/context. No FS, processes, or state—perfect for stateless, short-lived code.

In Harshil's OpenClaw alternative on Cloudflare Workers: AI generates Hacker News fetch skill, executes in dynamic Worker Isolate. Code:

loader.load({
  code: userCode,
  globalOutbound: null  // Blocks all network
});
env = { db: restrictedQuery, logger };
isolate.fetch(new Request('/run', { body: JSON.stringify({ code, env }) }));

Bindings proxy via Worker RPC: AI calls db.query() → Worker validates/routes. Network options: null (default), proxy/routable (allowlist domains), or open (avoid). Scopes DB to user ID for multi-tenancy.

"Think of it like a room with no doors or windows. The only thing inside are what I put there before I locked it." (Harshil Agrawal, on isolates' isolation via bindings, emphasizing zero unintended access.)

Containers for Full Environments with FS and Processes

For npm installs, git clones, dev servers (e.g., motion graphics previews), use Linux containers. Seconds to start, real FS/processes/networking.

Harshil's PromptMotion.app (live at promptmotion.app): User describes animation → AI writes Remotion code → clones repo, npm install, runs dev server, exposes preview URL. Per-user container via Cloudflare Sandbox SDK + Durable Object coordinator.

Pseudo-code flow:

sandbox = sdk.getSandbox({ userId });  // Isolation boundary
await sandbox.exec('git clone starter-repo');
await sandbox.exec('npm install');
sandbox.startProcess('npm run dev');
url = sandbox.exposePort(3000);

User A/B have separate FS—User A's ls sees only their files. Proxy secrets: sandbox → Worker proxy endpoint → external API (key stays outside).

"One user one sandbox no exception." (Harshil Agrawal, stressing user ID as isolation boundary to prevent cross-tenant leaks.)

Cleanup: try/finally destroy on session end/30min timeout; Cloudflare defaults 10min.

Trade-offs: Match Tool to Use Case

Isolates: JS/TS/Python/Wasm only, no FS/state/heavy compute. Wins: fast, cheap, simple for agents/plugins. Loses: no npm/processes.

Containers: Full Linux (bash/Node/Git), but slow/expensive/complex. Wins: real apps/previews. Loses: ms latency.

Choose by needs—quick functions? Isolates. Full stacks? Containers. Proxy secrets always; route network via Worker for control.

"The key insight here is it's not about which one is the best. It's about what your use case requires." (Harshil Agrawal, on isolates vs containers, urging threat-model fit over one-size-fits-all.)

Key Takeaways

  • Model threats: hallucinations (crashes/DoS), helpful leaks (secrets), injections (exfil)—all via full privileges.
  • Adopt capability security: bind only needed APIs (e.g., scoped DB), null outbound network.
  • Use V8 isolates for <100ms JS/Python tasks; Cloudflare Dynamic Worker Isolates example: 5 lines for secure exec.
  • Deploy containers for FS/process needs; per-user via SDK/Durable Objects, proxy secrets.
  • Enforce one-user-one-sandbox; try/finally cleanup to avoid idle liabilities.
  • Proxy all secrets/network via Worker; never env-inject keys.
  • Stateless isolates match agent tools; externalize state via bindings.
  • Evaluate: secrets/net/FS/multi-tenant/compute before picking isolate/container.
Video description
We are using AI to write code. Moreover, we are using it to be more productive. However, giving AI access to our machine and let them run on their own is dangerous. Imagine, giving AI access to the server where you run your application! You want your users to interact with your application through a chat interface, and maybe build their own apps or customize the UI. If not supervised carefully, AI can break your application or worse leak private data. So how do you run AI generated code within your application and allow users to build their own apps? In this talk, we'll go beyond the hype and dive into the practical architecture of sandboxing AI generated code. You'll learn how to integrate an LLM to generate code and, how to run that code in a secure isolated environment. Harshil Agrawal - Sr. Developer Educator, Cloudflare Working in the Developer Relations team at Cloudflare, Harshil enjoys sharing his learnings with the community. A JavaScript developer, open-source contributor, and a low-code enthusiast, Harshil loves experimenting with tech and building small projects. Socials: https://x.com/harshil1712 https://linkedin.com/in/harshil1712 https://harshil.dev Slides: https://harshil.dev/slides/sandbox-ai-engineer

Summarized by x-ai/grok-4.1-fast via openrouter

7923 input / 2027 output tokens in 17765ms

© 2026 Edge