Agent Harness: 9 Components Beyond Frameworks
A harness is a fixed while-loop architecture that turns one-shot LLMs into iterative agents with tools, context control, subagents, memory, and safety—pre-wired unlike LangChain-style frameworks you assemble.
Harness Delivers Ready Agents, Frameworks Require Wiring
Turn one-shot LLMs into agents by wrapping them in a fixed harness architecture: a while loop that lets the model act (via tools), observe results, and iterate until solving the goal or hitting an iteration cap. This contrasts with frameworks like LangChain, LangGraph, AutoGen, and CrewAI, which provide abstractions (chains, memory, retrievers) for humans to assemble agents. Harnesses ship pre-wired for immediate use—you input a goal, it handles the rest. Examples include coding tools like Cursor and Claude Code, which evolved similar architectures for repo-wide code editing, starting from concrete problems rather than general abstractions.
Trade-off: Frameworks offer flexibility for custom agents but demand architecture work; harnesses prioritize out-of-box reliability, assuming the fixed loop + registry covers 80% of needs.
9 Components for Production Harnesses
Build robust agents with these interconnected parts, drawn from tools like Claude Code (200k tokens budget, now 1M for Opus):
- While Loop Engine: Core iteration—model reads system prompt, calls tools, feeds results back, repeats until text-only response or max iterations (prevents infinite loops).
- Context Management & Compaction: Tree-like context grows with messages/tools; at 80-90% of limit (e.g., half of 1M tokens), keep recent messages verbatim, summarize older ones. Poor compaction loses critical history, causing failures.
- Tools vs Skills + Registry: Tools are primitives (read file, run bash); skills encode team knowledge via Markdown files (e.g., git commit process). Registry maps names to handlers, permissions, descriptions—model sees lightweight descriptors to decide calls.
- Subagent Management: For parallel/big tasks, spawn isolated subagents with restricted tools, focused prompts, own sessions—span, restrict, collect outputs.
- Built-in Skills: Ship essentials like file read/write/edit/search, bash execution, code navigation, git commits, PRs, tests. Use stdlib only for primitives to avoid deps.
- Session Persistence/Memory: Append-only JSON/Markdown logs every event (messages, tools, compactions) to disk for crash-proof resumption—replay rebuilds state exactly.
- Dynamic System Prompt Assembly: Pipeline scans directories for files like CLAUDE.md or AGENTS.md, injects after static prefix (order preserves caching). Enables contextual instructions without hardcoding.
- Lifecycle Hooks: Pre-tool: allow/deny/modify calls (JSON exit codes). Post-tool: audit results, log. Enables extensibility without core changes, key for enterprise.
- Permissions/Safety: Tools declare min perms (read-only, workspace, full). Harness enforces at dispatch; dynamic classification for bash (ls=read, rm=full); interactive user approvals for risky actions.
These make harnesses safe and durable—e.g., Anthropic separates session mgmt from core for scalability.
Python Reference Implementation Template
Core engine: While loop assembles dynamic prompt, compacts context if oversized (summarize old), handles tool/subagent calls, caps iterations. Tools/skills as dataclasses (name, perms, handler, desc) in dict registry—descriptors for model, skills load MD on invoke.
Subagents: Archetypes (explore/general/verify) with perm/tool restrictions, focused prompts.
Built-ins: Stdlib file read/bash.
Memory: append(event) writes JSON lines (flush for durability); replay() reconstructs.
Prompts: Static + dynamic dir scan (static first).
Hooks: Pre/post functions on tool events.
Permissions: Check declared + dynamic parse (safe=read like grep; dangerous=full like sudo); user approve.
This ~100-line skeleton supports all 9 components—extend by registering tools/skills, no framework deps.