Build Minimal Coding Agents Like Pi to Retake Control

Ditch Bloated Harnesses for Context Control

Cloud Code started simple but added unneeded features, frequent breaks (e.g., flickering UI, third iteration of 2D renderer), and uncontrolled context changes: shifting system prompts, tool modifications, inserted "may or may not be relevant" reminders that confuse models. No observability, fixed model (Claude), shallow extensibility via inefficient hook processes. Open Code prunes tool output after token limits (lobotomizing models), injects LSP errors mid-edit (disrupting flow since devs fix errors post-writing), stores messages as separate JSON files, exposes server via CORS. Benchmarks like TerminalBench reveal minimalism wins: it uses only keystroke tools in tmux, topping leaderboards (Dec 2025) over complex harnesses, proving we're in an experimental phase needing malleable, self-modifying agents.

Build your own minimal core: Pi provides AI provider abstraction, agent loop with tool calling, flicker-free toy renderer (game dev roots), four tools (read, edit, mesh, bash—tiny defs under 100 tokens). Models know coding agents from RL training, so system prompt is ~20 lines including skills standard (markdown files). Ship docs/code examples; agent modifies itself via extensions. YOLO mode (no bash confirmations) for custom security. Extensions are TS modules with API for tools/commands/events/state/compaction/providers—hot-reload in-session. Examples: slash/why from Anthropic prompt built in 5 mins; multi-agent chat rooms; NES/Doom emulators. Pi scored 6th on TerminalBench (Oct 2025, pre-compaction). Bundle/share via npm/GitHub, no silos.

Block AI Slop in OSS with Human Vouches

AI instances ("clankers") flood trackers: OpenClaw/Pi's issues half garbage; Tal Draw closed trackers. Counter: Auto-close PRs demanding "human voice, screen-length issue"—clankers ignore, humans comply; whitelist approved accounts. Label clanker interactions to deprioritize. Embed issues/PRs in 3D space for clustering. OSS vacation: close trackers when needed. Vouch system (inspired Mitchell) filters perfectly since clankers don't iterate.

Scope Agents Tightly, Review Critical Code

Agents compound "boo boos" (errors) via serial learning, no bottlenecks, delayed pain: 1 human adds few daily; 10 agents explode unreviewable complexity from internet garbage (90% mediocre code). Local decisions spawn intertwined abstractions/duplication/backwards compat/enterprise bloat. Specs with blanks fill via training slop. Agents lack pain feedback—keep shitting without learning. Long contexts/agentic search fail; patches local, break global. Tests untrustworthy if agent-written.

Use for scoped tasks (full context fit, eval function): hill-climb/autoresearch/non-critical/boring/repro cases/rubber duck—evaluate output, take reasonable parts, human-finalize. Slow down: think why build; say no to features; polish few key ones. Wipe slop on non-critical; hand-write critical (friction builds understanding). Discipline: humans bottleneck productively via pain/refactor.