Claude Opus 4.7: 13% Coding Gains, 3x Vision for Agents

Agentic Coding Improvements Enable Autonomous Workflows

Claude Opus 4.7 outperforms Opus 4.6 by 13% on a 93-task coding benchmark, solving four tasks that prior models couldn't handle, reaching 70% on CursorBench (up from 58%). For multi-step workflows, it gains 14% accuracy using fewer tokens and one-third fewer tool errors, becoming the first model to pass implicit-need tests by continuing execution despite tool failures. Builders gain confidence handing off complex coding—previously requiring supervision—to Opus 4.7, as it autonomously verifies outputs before reporting, closing a loop absent in earlier versions. This supports CI/CD pipelines and agentic setups where models self-check rigor and instruction adherence.

High-Resolution Vision Fixes Real-World Multimodal Bottlenecks

Opus 4.7 processes images up to 2,576 pixels on the long edge (3.75 megapixels), over three times prior Claude models' capacity. This resolves fine details in dense UIs, engineering diagrams, and screenshots, where prior limits caused failures despite strong reasoning. A computer-use tester saw visual-acuity scores jump from 54.5% (Opus 4.6) to 98.5%, eliminating their top pain point. Downsample non-critical images to save tokens, as higher resolution increases consumption—unlocking pixel-perfect data extraction and agentic vision tasks.

Controls and Tools for Long-Horizon Execution

New API options include xhigh effort level (above high and max) and task budgets to manage compute. Claude Code adds /ultrareview for Pro/Max users (three free trials), generating senior-engineer-style reviews flagging bugs and design issues in changes—ideal pre-merge for complex PRs. Auto mode extends to Max users, letting Claude auto-approve decisions for uninterrupted long tasks like overnight agents over large codebases. Enhanced file-system memory retains notes across sessions, reducing context needs; it hits state-of-the-art on GDPval-AA for knowledge work in finance/legal domains.