Claude Opus 4.7: Coding Gains but Token Traps Ahead

Core Capability Upgrades for Agentic Workflows

Opus 4.7 outperforms Opus 4.6 across key benchmarks, particularly in coding (agentic coding beats prior version), instruction following, and multimodal understanding. It excels in high-resolution image processing, making it stronger for document-heavy agentic tasks—surpassing Opus 4.6 substantially on multimodal agentic benchmarks, with accuracy tied to resolution (higher res yields better results but more tokens). File-based memory handling improves significantly, aligning with tools like Claude Code that treat files as persistent memory rather than semantic search, boosting coding agents. Document reasoning and 1M-token context window see better long-term coherence (e.g., on Vending Bench 2). Self-verification during code generation is now trained in. Compared to Methus preview, it's weaker overall but edges out in agentic coding; tool use (search, computer) is close, with cyber safeguards limiting advanced risks.

Prompt and Migration Trade-offs

Switching from Opus 4.6 requires retuning prompts: Opus 4.7 follows instructions literally versus prior loose interpretation or skipping, potentially yielding unexpected outputs without adjustments. Updated tokenizer maps inputs to 1-1.35x more tokens depending on content. Higher default effort levels (extra high in Claude Code, between high and max) enhance reliability on hard problems and later agent turns but generate more output tokens, accelerating rate limit exhaustion and costs. Pricing matches Opus 4.6, signaling same model class. Benchmarks show internal multimodal gains not always replicated externally; Sweetbench multimodal uses internal implementation, so scores aren't public-comparable—watch for potential benchmarking caveats.

New API and Platform Tools

API adds high-res image support and public beta task budgets to cap/control token spend over long interactions. Claude Code defaults to extra high effort (fixing prior medium's performance complaints) but burns tokens faster—manually tune effort for coding. Ultra Review (/ultra-review slash command) launches dedicated sessions to scan changes, flagging bugs and design issues like a human reviewer. Release timing (7:30am, simple X thread, no demos) suggests rush ahead of expected OpenAI drop.