The Shift from Writing to Verifying
For years, code review was a 'happy accident' of engineering: senior developers could read code faster than juniors could write it, making review a natural, low-friction knowledge-sharing process. That dynamic is dead. Modern agents generate code at machine speed, while human reading speed remains constant. The result is a massive bottleneck where teams are drowning in PRs, leading to a 31% increase in zero-review merges and a 441% increase in review duration.
The Reality of AI-Generated Code
Data from 2026 confirms that AI increases raw output by roughly 4x, but delivered value only by about 12%. The gap is filled with 'AI slop'—code that is often well-formatted but logically flawed. Studies show AI-authored changes carry 1.7x more issues, including security vulnerabilities and readability problems. Crucially, mature teams with disciplined processes are just as vulnerable as everyone else; the volume of output simply overwhelms existing human-centric workflows.
Redefining the Purpose of Review
Review is no longer just about catching bugs; it is about reconstructing intent. When humans write code, the 'why' is implicit. When agents write code, the reasoning is often discarded. Reviewers are now forced to act as the 'first human to ever lay eyes on this code,' attempting to reverse-engineer intent that was never documented. The solution is not to stop using AI, but to force agents to attach decision logs and rationale to PRs, making the intent explicit rather than implicit.
The Case for Heterogeneous Review
There is no single 'best' AI reviewer. Experiments running multiple tools in parallel (e.g., CodeRabbit, Greptile, Sentry Seer) show that different models catch different classes of bugs with almost zero overlap. The most effective strategy is to run two or more reviewers with different 'characters'—one focused on correctness, another on production-failure severity—rather than relying on a single tool or multiple instances of the same model.
Human-on-the-Loop vs. Human-in-the-Loop
We are moving toward a 'human-on-the-loop' model. Instead of reviewing every line, engineers must act as auditors. By using AI to triage PRs into 'safe,' 'needs work,' and 'high-risk' buckets, developers can allocate their limited attention to the high-blast-radius paths where being wrong is costly. The goal is to set the 'human dial' based on the project's blast radius, code longevity, and team size, rather than adhering to a one-size-fits-all review policy.
Key Takeaways
- Match effort to risk: Stop reviewing every PR with the same intensity. Use a tiered approach where config changes get automated checks, while core business logic receives deep human review.
- Capture intent early: Require agents to output a decision log or rationale. This reduces the 'reconstruction cost' for human reviewers.
- Diversify your reviewers: Run at least two AI reviewers with different strengths (e.g., one for correctness, one for security) to cover blind spots.
- Audit, don't just read: Use AI to triage incoming queues. Spend your time confirming the 'safe' merges and focusing deep attention on the 'dangerous' ones.
- Beware of borrowed confidence: Avoid closed-loop systems where models review other models from the same family; they often share the same blind spots and will confidently agree on incorrect code.
Notable Quotes
- "We poured machine-speed output into a system built for human-speed work. The bottleneck did not disappear; it moved to verification."
- "The gap between 4x the code and a tenth more value is the review problem stated in one line."
- "Reviewing an agent’s PR made them the first human being to ever lay eyes on this code."
- "The human does the expensive thinking before the code exists and the machine does the line-by-line afterward, which may well be the shape of where this goes."
- "How much human you keep is a dial, and you set it by blast radius, not by guilt."