Claude Opus 4.7: 10%+ Coding Gains, Smarter Memory
Opus 4.7 beats 4.6 by over 10 points on SWE-bench Pro, handles unsupervised engineering tasks better, uses file-based memory efficiently, and adds API task budgets—priced at $5/M input, $25/M output tokens.
Superior Coding and Creative Output
Claude Opus 4.7 tackles complex software engineering tasks that previously required close human supervision, delivering significant benchmark jumps like over 10 percentage points on SWE-bench Pro and gains on multilingual SWE-bench compared to Opus 4.6. Users can offload front-end development, high-quality interfaces, slides, and docs, where the model shows improved tastefulness and creativity. It pushes back against sycophancy by offering opinionated feedback—e.g., critiquing suboptimal architecture during code scaffolding and suggesting alternatives—which leads to better final outcomes. In agentic coding, max effort on 4.7 consumes substantially more tokens than on 4.6, so downgrade to high effort post-upgrade to control costs.
Enhanced Vision, Reasoning, and Memory
Vision capabilities improve on agentic computer use and visual reasoning metrics (e.g., web navigation), outperforming Opus 4.6 but trailing Mythos preview. File-system-based memory shines: the model recalls key notes across multi-session workflows, reducing upfront context needs and enabling progressive disclosure—scanning directories, reading only relevant markdown/scripts/files as needed. Anthropic pioneered this via tools like Claude Code and skills, making long-running tasks more efficient.
Practical API and Tooling Controls
Available now via API, Claude Code, web/desktop apps at $5 per million input tokens and $25 per million output. New API features include 'extra high' effort parameter for finer reasoning control between high and max, and beta task budgets letting Claude prioritize work under cost constraints—useful for scenarios like vending bench simulations where models manage business ops under limits. Claude Code adds 'ultra review' for dedicated change reviews flagging issues a human reviewer would catch, plus auto mode extension to Max users as a safer alternative to bypass permissions.