GPT-5.5 xHigh Reasoning Builds Deeper Production Code
In GPT-5.5 tests on a Laravel/Filament task, xHigh used 44% session (4x Medium's 10%), took 14 min vs. 6 min, but added policies, extra tests, preloads—worth it for auth/data integrity risks.
xHigh Spends More on Exploration for Thoroughness
Testing the same prompt across GPT-5.5 reasoning levels on phase 6.3 of a Laravel/Filament project (build application details page with buttons/sections) revealed stark resource differences. Medium finished in 6 minutes using 10% of 5-hour session limit. High doubled to 12 minutes and 18% usage. xHigh took 14 minutes but consumed 44%—over 4x Medium—mostly from 7.5-minute exploration of 30+ files (migrations included) before editing. Editing/tests were comparably fast across levels, but xHigh added an extra test after passing the suite, double-checked, and marked complete only after deeper validation.
This upfront thinking ensures accuracy: Medium generated a skeleton by 6 minutes (when it finished), High passed tests at 10 minutes, xHigh was still reading files. Token costs scale with reasoning depth, not just output—xHigh logs hit 1,200 lines vs. hundreds for others.
Higher Levels Shift from Fast to Idiomatic to Over-Engineered
Code analysis by Claude Opus highlighted architecture progression. Medium took the simplest path: inline InfoList in Filament admin—no deep Filament knowledge needed. High used textbook structure: dedicated InfoList class per docs. xHigh built a rich read-only InfoList schema with helpers, preloading tags for performance and handling soft deletes (withTrashed).
Authorization showed the biggest gap. Medium ignored Laravel policies. High added scoped queries/visibility but no policy edits. xHigh implemented defense-in-depth: policy helpers, server-side checks. For wire:chat package integration, Medium did basics, High added features, xHigh layered more. Tests followed: Medium minimal, High Filament v5 idioms, xHigh bonus coverage for edge cases.
All passed green tests, but xHigh anticipated future issues—auth logic, data integrity, adjacent tasks—not just the prompt.
Use xHigh for Production Risks, Medium for Quick Tasks
Claude's verdict: Medium implements literally (fastest route). High follows docs idiomatically. xHigh notices extras like permissions, producing production-grade code safer for auth/data risks. Trade-off is cost: 4x tokens for similar time, but clearer logs/code diffs justify it. Before testing, differences seemed subtle; results showed xHigh reasons about project future, not just present task. For low-risk prototypes, stick to Medium. Scale to xHigh when deploying code touching security/integrity.