Complex Agents Need High-Bandwidth Artifacts, Not Chat

Overcome Context Rot with Verifiable Tasks and Proxies

Long-running agents fail in chat due to context compaction, where fixes alter unrelated parts after 30+ minutes of work, like changing only clause three in a contract but rewriting everything. Economics have shifted in 6-12 months: execution is cheap, but planning and reviewing bottleneck complex end-to-end tasks. Apply verifier's rule—tasks easy to solve and verify get automated via loops or RL. Legal examples: checking definitions is verifiable; drafting contracts isn't (only courts verify); litigation strategy defies objective truth as five lawyers disagree. Coding mirrors this: simple features verifiable via TDD/browser access; consumer apps aren't. Boost agent performance by shifting tasks left on solvability-verifiability spectrum: use golden precedents as proxies (e.g., similarity to past contracts), decompose (human picks risk profile/precedents, agent handles formatting/linting), add guardrails (limit to specific files/directories/sites), preventing weird actions like Claude Code's constant approvals (low trust) vs. YOLO mode (high trust, risks prod DB deletion).

Encode Human Judgment via Skills and Elicitation for Control

Treat agent workflows as DAGs/trees; root-level intervention yields low control, as humans can't steer sub-tasks like reviewing specific clauses in employment contracts. Planning helps upfront alignment (approve steps/clauses) but requires exhaustive upfront work, misses contingencies (e.g., special EU termination law), and feels like a coworker vanishing post-alignment. Skills excel by embedding judgment at nodes: define 'review confidentiality this way' or handle special cases dynamically, enabling progressive discovery. Elicitation unblocks agents—when unsure, decide, log to decision log for human review/reversal—avoiding halts. For Legora's 1,000+ law firm customers across 50 markets, this scales: humans intervene where unverifiable (strategy), agents handle verifiable subtasks.

Replace Chat with Vertical-Specific Artifacts for High-Bandwidth Collaboration

Chat collapses trees into linear, low-bandwidth streams, overwhelming users with 50+ questions sans context. Use persistent, industry-tailored artifacts: documents for highlighting clauses, comments, tagging agents/collaborators, handing off sections; tabular reviews flag issues across contracts for quick judgment, then auto-continue. Language excels for input (flexible, voice-compatible), but agents transcend human limits—don't constrain to chat output. UI converges on post-hoc, non-linear views: chat initiates, artifacts sustain collaboration, enabling control (steer precisely), trust (easy review), and efficiency for vertical AI goals like end-to-end legal work.