Pi: Minimal Agent to Reclaim Workflow Control

Existing Coding Agents Undermine Control and Reliability

Commercial tools like Cloud Code start simple but devolve into unreliable bloat: daily breaks from high-velocity features, hidden context manipulations (e.g., changing system prompts, inserting irrelevant reminders per release, tool removals), zero observability, fixed model (Claude), shallow extensibility via inefficient process-spawning hooks. OSS alternatives like Open Code prune tool output at token limits (lobotomizing models), inject LSP errors mid-edit (confusing iterative coding), store messages as individual JSON files (inefficient), expose servers via CORS to any browser. Benchmarks like Terminal Bench reveal truth: its minimal keystroke tool outperforms complex harnesses (top scores across models in Dec 2025 leaderboard), proving we're in 'try around and find out' phase—overengineering hurts.

Result: Lose workflow sovereignty as tools dictate context, not you. Thesis: Need malleable, self-modifying agents.

Pi Delivers Extensibility Without Bloat

Pi strips to essentials: AI provider abstraction, agent core (while loop + tool calling), flicker-free terminal UI (game dev roots), four tools (read_file, edit_file, bash, message). System prompt is tiny (~100 tokens), models know coding agents from RL training—no verbose setup needed. Ships handcrafted docs/code examples; agent modifies itself via extensions (e.g., 'build sub-agent support'). YOLO security by default (customize as needed, no nagging dialogs).

Extensions are TypeScript modules with full API: add tools/commands/shortcuts, hook events, custom compaction/providers, session state. Hot-reload during sessions for game-dev-fast iteration. Publish to npm/GitHub—no silos. Examples: slash/why from Claude prompt (built in 5min), multi-agent chat rooms, NES/Doom emulators. Build extensions by prompting pi itself. Pre-packaged: skills standard (markdown tools). Scored 6th on Terminal Bench (Oct 2025, pre-compaction). Retake control: pi adapts to you.

OSS Under Siege: Filter Clanker Spam Aggressively

Agents ('clankers') flood trackers: Tal Draw closes issues, Open Code/OpenClaw/pi repos half-filled with garbage PRs/issues from unaware users (pi collateralized into OpenClaw's core). Countermeasures: Auto-close PRs demanding 'human voice' issues (<1 screen); whitelist approved accounts; deprioritize agent interactions; 3D cluster viz for issues; 'OSS vacation' (close tracker arbitrarily). Vouch system (Mitchell's): perfect as clankers ignore instructions. Reclaims maintainer sanity.

Agents Compound 'Boo boos'—Use for Scoped Tasks Only

Agents amplify internet slop (90% garbage code): local decisions yield enterprise complexity (abstractions/duplication/backwards compat/defense-in-depth) in weeks. Detailed specs become programs; blanks filled with mediocre training data. Unlike humans (learn from pain, bottleneck errors), agents pile boo boos serially, no global fixes. Review impossible: 1 human adds few daily; 10 agents explode them. Review agents create 'Oroboros' loops. Long contexts/agentic search fail; tests untrustworthy (agent-written).

Good tasks: Scoped (modular code, all context fits), evaluable (hill-climb), non-critical (repros, rubber duck, boring wipes). Post-agent: Evaluate (discard most), human-finalize critical code (read every line—friction builds understanding). Rules: Slow down, say no to features, hand-write important code (agents assist, don't decide), polish with agents. Discipline over token-maxing: humans essential.