Opus 4.7 Beats 4.6 on Long Coding Tasks with Full Features

Larger Context and Planning Enable 4.7's Full Delivery

Claude Opus 4.7 completed all 20 phases of a Laravel/React/Inertia project—including seeding data, role-based dashboards (admin, agent, customer), request submission, assignment queues, filtering, and permissions—in 34.5 minutes. It displayed a visible task progress list throughout, used 25% of its 1M token context (automatic on Claude Max $100 plan), passed 116 tests, and produced a working app after a quick npm run build fix. In contrast, Opus 4.6, limited to 200K context, reached 79% usage by task 17, triggered a context limit error at 34 minutes despite confirming task completion, and required manual database migration/refresh plus fixes for caching deserialization and missing 403 handling. Result: 4.6 delivered incomplete dashboards without request submission, agent queues, or full permissions—many pages were stubs labeled "this page will be built," despite identical prompts claiming full delivery.

To replicate success, use Claude Max plan for 1M context on 4.7 (model ID auto-enables it); 4.6 lacks clear 1M ID in docs/API. Set effort to "high" via /effort (not default "x-high" on 4.7, which burns more tokens) for fair, efficient runs.

4.7 Produces Cleaner, More Granular Code

Opus 4.7 organized routes with nested middleware groups (e.g., role-based subgroups for admin/agent/customer), granular controller names (e.g., RequestAssignmentController for admin, UnassignedQueueController for agents), and complete separation of responsibilities. Policies handled authorization robustly; front-end included real forms with filtering. Codex analysis confirmed: 4.7 built actual features (customer request form, admin assignments, agent queues with filters) vs. 4.6's placeholders/stubs; better naming avoided monolithic RequestController; more complete policies over 4.6's hardcoded role checks.

Tests: 4.6 wrote more but with fewer assertions. Overall, 4.7's codebase supports full functionality without cutting corners, even under similar token loads—ideal for production-like multi-role apps where stubs fail user testing.

Equal Token Costs Hide Efficiency Gains, But Stability Hurts

Both models used 22% of session tokens (Claude Max plan) at high effort, despite 4.7 delivering more. 4.7 communicated less (fewer explanations, more test/commit loops), showed persistent task plans, and stayed token-efficient by focusing on action over narration—Codex noted 4.6's verbosity likely offsets this in practice. Default x-high effort on 4.7 risks higher costs/slower speed; always check/adjust.

Caveats: 4.7 threw 500 server errors on first run (status.anthropic.com falsely showed operational); Anthropic's releases remain unstable (rate limit tweaks, hotfixes, 98% uptime). Security overreach refused some prompts detecting "malware" (even in system prompts). Use 4.7 for long tasks only if stable—4.6 may suffice for shorter ones without context woes.

Larger Context and Planning Enable 4.7's Full Delivery

4.7 Produces Cleaner, More Granular Code

Equal Token Costs Hide Efficiency Gains, But Stability Hurts

More on Edge

AI Coders Default to Hardcoded Keyword Rules

GPT 5.5 Tops Opus 4.7 and DeepSeek V4 in Coding Benchmarks

GPT-5.5 Outpaces Opus 4.7 in Speed and Token Efficiency

Kimi K 2.6 Rivals Opus/GPT-4 on Laravel Tasks, Cheaper