Validate Agent PRs with Correctness Checks First

Turn PR Issues into Enforceable Correctness Conditions

When a PR adds code like a new run script but skips updating the README, don't just comment—define a correctness condition: "All PRs include updates to all relevant documentation files." This shifts from one-off fixes to systemic guarantees using AI agents.

Implement via two levers: (1) Update instructions in files like AGENTS.md to make coding agents proactively scan and edit docs; (2) Deploy a reviewer agent that rejects PRs missing doc updates. The condition ensures documentation stays in sync automatically, reducing manual scrutiny.

This approach scales: PR reviews become system audits, spotting what agent contexts or feedback loops need tuning, then testing those fixes on the current PR.

Prioritize Validation for Guaranteed Correctness

Start with validation, not instructions. Adding a reviewer agent first blocks invalid PRs outright, forcing the coding agent to iterate until docs match code changes. This provides a hard guarantee—far better than hoping improved instructions "work most of the time."

Instructions alone risk silent failures: future PRs might miss docs unnoticed, as humans shouldn't micromanage agent outputs anyway. Validation makes instructions an optimization later, speeding feedback loops without sacrificing correctness.

Trade-off: Without instructions, PRs take longer initially due to rejection-comment cycles. But nondeterministic agents demand checks; if the reviewer slips, refine its single-task prompt.

Elevate Development to Agent System Testing

This mirrors test-first development at a higher level: property-based, not unit-specific. Instead of hard-coding "update README," assert the general property "docs reflect code post-change."

It evolves the Boy Scout Rule from cleaning code to strengthening the entire agent-driven pipeline—"programming the agents to program our software." Every PR tests and improves the automation system, turning reviews into opportunities for broader fixes.