Build Minimal Coding Agents Like Pi to Retake Control
Existing coding agent harnesses like Cloud Code bloat context and break workflows; build extensible minimal cores like Pi for adaptability. Protect OSS from AI-generated slop with filters. Use agents only for scoped, non-critical tasks—review all critical code by hand.
Ditch Bloated Harnesses for Context Control
Cloud Code started simple but added unneeded features, frequent breaks (e.g., flickering UI, third iteration of 2D renderer), and uncontrolled context changes: shifting system prompts, tool modifications, inserted "may or may not be relevant" reminders that confuse models. No observability, fixed model (Claude), shallow extensibility via inefficient hook processes. Open Code prunes tool output after token limits (lobotomizing models), injects LSP errors mid-edit (disrupting flow since devs fix errors post-writing), stores messages as separate JSON files, exposes server via CORS. Benchmarks like TerminalBench reveal minimalism wins: it uses only keystroke tools in tmux, topping leaderboards (Dec 2025) over complex harnesses, proving we're in an experimental phase needing malleable, self-modifying agents.
Build your own minimal core: Pi provides AI provider abstraction, agent loop with tool calling, flicker-free toy renderer (game dev roots), four tools (read, edit, mesh, bash—tiny defs under 100 tokens). Models know coding agents from RL training, so system prompt is ~20 lines including skills standard (markdown files). Ship docs/code examples; agent modifies itself via extensions. YOLO mode (no bash confirmations) for custom security. Extensions are TS modules with API for tools/commands/events/state/compaction/providers—hot-reload in-session. Examples: slash/why from Anthropic prompt built in 5 mins; multi-agent chat rooms; NES/Doom emulators. Pi scored 6th on TerminalBench (Oct 2025, pre-compaction). Bundle/share via npm/GitHub, no silos.
Block AI Slop in OSS with Human Vouches
AI instances ("clankers") flood trackers: OpenClaw/Pi's issues half garbage; Tal Draw closed trackers. Counter: Auto-close PRs demanding "human voice, screen-length issue"—clankers ignore, humans comply; whitelist approved accounts. Label clanker interactions to deprioritize. Embed issues/PRs in 3D space for clustering. OSS vacation: close trackers when needed. Vouch system (inspired Mitchell) filters perfectly since clankers don't iterate.
Scope Agents Tightly, Review Critical Code
Agents compound "boo boos" (errors) via serial learning, no bottlenecks, delayed pain: 1 human adds few daily; 10 agents explode unreviewable complexity from internet garbage (90% mediocre code). Local decisions spawn intertwined abstractions/duplication/backwards compat/enterprise bloat. Specs with blanks fill via training slop. Agents lack pain feedback—keep shitting without learning. Long contexts/agentic search fail; patches local, break global. Tests untrustworthy if agent-written.
Use for scoped tasks (full context fit, eval function): hill-climb/autoresearch/non-critical/boring/repro cases/rubber duck—evaluate output, take reasonable parts, human-finalize. Slow down: think why build; say no to features; polish few key ones. Wipe slop on non-critical; hand-write critical (friction builds understanding). Discipline: humans bottleneck productively via pain/refactor.