Claude Opus 4.7: 3x Vision, Self-Verifying Agents, 70% Coding Wins

Agentic Coding and Long-Horizon Reliability

Claude Opus 4.7 outperforms Opus 4.6 by 13% on a 93-task coding benchmark—solving 4 tasks neither prior Opus nor Sonnet could—and hits 70% on CursorBench (up from 58%). For multi-step workflows, it gains 14% at fewer tokens with 1/3 fewer tool errors, passing implicit-need tests by continuing through failures. Builders gain confidence handing off complex, unsupervised tasks: the model now autonomously verifies outputs before reporting, closing a loop absent in prior versions. This shifts agentic workflows from supervised to autonomous, ideal for CI/CD pipelines and overnight codebase agents, reducing supervision on hard engineering work.

On GDPval-AA (finance/legal knowledge tasks), it sets state-of-the-art, leveraging file-system memory to retain notes across multi-session runs—needing less context for follow-on tasks.

High-Resolution Vision for Real-World Multimodality

Process images up to 2,576px long edge (~3.75 megapixels)—3x prior Claude pixels—without API tweaks. Downsample non-critical images to save tokens. This fixes failure modes in computer-use agents (dense UI screenshots), diagram data extraction, and pixel-perfect refs: one tester's visual-acuity benchmark jumped from 54.5% (Opus 4.6) to 98.5% (4.7), eliminating their top pain point. Use for production agents parsing engineering diagrams or screenshots where detail loss previously blocked reasoning.

Production Levers and Claude Code Tools

New xhigh effort level (between high/max) balances reasoning/latency; default in Claude Code. Public beta task budgets cap token spend for long agent runs/parallel pipelines. In Claude Code: /ultrareview slash command flags bugs/design issues like a senior review (3 free for Pro/Max); auto mode (now for Max users) auto-approves decisions for uninterrupted long tasks with lower risk than full skips. Start coding/agent tests at high/xhigh for optimal tradeoffs in cost-sensitive prod.