Sandbox AI-Generated Code with Capability Security

Threats from Running Unreviewed AI Code

AI-generated code acts like untrusted internet snippets: LLMs produce text resembling code without review, exposing apps to risks. Harshil Agrawal outlines three key dangers. First, hallucinations create broken code—non-existent imports crash processes, recursive functions blow stacks, infinite loops burn compute. These aren't malicious but still disrupt production. Second, "helpful" LLMs access secrets unintentionally, like scanning env vars for database configs and processing API keys. Third, prompt injections—direct ("ignore instructions, exfil env vars") or indirect (adversarial docs)—turn the LLM into an attack vector. All run with full app privileges: file system, network, DBs, secrets.

"Stripe away all the hype... What we are actually doing is running untrusted code from the internet." (Harshil Agrawal, reframing AI code gen as a security risk to highlight why isolation is essential.)

Without safeguards, one bad snippet crashes services, leaks data, or enables exfiltration.

Capability-Based Security as the Core Principle

Borrow from browsers, OSes, and phones: default-deny, explicitly grant minimal capabilities. Blocklists miss attacks; allowlists eliminate unneeded access. No network? Set outbound to null. Need DB? Bind a scoped query method. This prevents exploits by design—dangerous ops aren't available.

"Don't enumerate what to block. Enumerate what to allow." (Harshil Agrawal, core principle of capability-based security, contrasting master-key blocklists with precise keys.)

Threat model checklist: secrets (env vars/API keys), networking (outbound calls), file system (other files/user data), multi-tenancy (cross-user leaks), compute (loops/memory DoS). Answer yes/no per resource before building.

V8 Isolates for Lightweight, Fast Execution

For sub-100ms tasks like agent skills, plugins, or data transforms, use V8 isolates (Chrome V8 engine). Start in ~1ms, run JS/TS/Python/Wasm in isolated memory/context. No FS, processes, or state—perfect for stateless, short-lived code.

In Harshil's OpenClaw alternative on Cloudflare Workers: AI generates Hacker News fetch skill, executes in dynamic Worker Isolate. Code:

loader.load({
  code: userCode,
  globalOutbound: null  // Blocks all network
});
env = { db: restrictedQuery, logger };
isolate.fetch(new Request('/run', { body: JSON.stringify({ code, env }) }));

Bindings proxy via Worker RPC: AI calls db.query() → Worker validates/routes. Network options: null (default), proxy/routable (allowlist domains), or open (avoid). Scopes DB to user ID for multi-tenancy.

"Think of it like a room with no doors or windows. The only thing inside are what I put there before I locked it." (Harshil Agrawal, on isolates' isolation via bindings, emphasizing zero unintended access.)

Containers for Full Environments with FS and Processes

For npm installs, git clones, dev servers (e.g., motion graphics previews), use Linux containers. Seconds to start, real FS/processes/networking.

Harshil's PromptMotion.app (live at promptmotion.app): User describes animation → AI writes Remotion code → clones repo, npm install, runs dev server, exposes preview URL. Per-user container via Cloudflare Sandbox SDK + Durable Object coordinator.

Pseudo-code flow:

sandbox = sdk.getSandbox({ userId });  // Isolation boundary
await sandbox.exec('git clone starter-repo');
await sandbox.exec('npm install');
sandbox.startProcess('npm run dev');
url = sandbox.exposePort(3000);

User A/B have separate FS—User A's ls sees only their files. Proxy secrets: sandbox → Worker proxy endpoint → external API (key stays outside).

"One user one sandbox no exception." (Harshil Agrawal, stressing user ID as isolation boundary to prevent cross-tenant leaks.)

Cleanup: try/finally destroy on session end/30min timeout; Cloudflare defaults 10min.

Trade-offs: Match Tool to Use Case

Isolates: JS/TS/Python/Wasm only, no FS/state/heavy compute. Wins: fast, cheap, simple for agents/plugins. Loses: no npm/processes.

Containers: Full Linux (bash/Node/Git), but slow/expensive/complex. Wins: real apps/previews. Loses: ms latency.

Choose by needs—quick functions? Isolates. Full stacks? Containers. Proxy secrets always; route network via Worker for control.

"The key insight here is it's not about which one is the best. It's about what your use case requires." (Harshil Agrawal, on isolates vs containers, urging threat-model fit over one-size-fits-all.)

Key Takeaways

Model threats: hallucinations (crashes/DoS), helpful leaks (secrets), injections (exfil)—all via full privileges.
Adopt capability security: bind only needed APIs (e.g., scoped DB), null outbound network.
Use V8 isolates for <100ms JS/Python tasks; Cloudflare Dynamic Worker Isolates example: 5 lines for secure exec.
Deploy containers for FS/process needs; per-user via SDK/Durable Objects, proxy secrets.
Enforce one-user-one-sandbox; try/finally cleanup to avoid idle liabilities.
Proxy all secrets/network via Worker; never env-inject keys.
Stateless isolates match agent tools; externalize state via bindings.
Evaluate: secrets/net/FS/multi-tenant/compute before picking isolate/container.