The GPT-5.6 Model Series
OpenAI has introduced three new models in the GPT-5.6 series, each optimized for different use cases:
- Sol: The flagship model, featuring a new
maxreasoning effort mode for deep analysis and anultramode that utilizes subagents to coordinate complex, multi-step tasks. - Terra: A balanced model designed for everyday professional workflows, offering performance competitive with GPT-5.5 at 50% of the cost.
- Luna: An efficient, low-cost model optimized for speed and affordability.
Performance and Benchmarks
The series demonstrates significant gains in agentic workflows, particularly in technical domains:
- Coding: GPT-5.6 Sol sets a new state-of-the-art on Terminal-Bench 2.1, which evaluates command-line planning and tool coordination.
- Biology: On GeneBench v1, the model shows improved performance in long-horizon genomics and quantitative analysis while consuming fewer tokens.
- Cybersecurity: The models show improved efficiency in vulnerability research. On ExploitBench, Sol matches the performance of the Mythos Preview while using only 33% of the output tokens. All three models show improved cyber capabilities on the ExploitGym benchmark as reasoning effort increases.
Layered Safety and Deployment
GPT-5.6 launches with a multi-layered safety architecture designed to mitigate misuse while supporting defensive security work. Key components include:
- Model-Level Safeguards: Training to refuse prohibited cyber assistance, even when users attempt jailbreaking or intent-masking.
- Real-Time Classifiers: A secondary layer that monitors output generation. High-risk requests trigger a pause, where a larger reasoning model reviews the context before allowing or withholding the output.
- Phased Release: The models are currently in a limited preview with trusted partners, coordinated with the U.S. government. OpenAI explicitly states this is a short-term measure and does not intend for government-gated access to become the long-term standard for model releases.
Cyber Preparedness
According to OpenAI's internal Preparedness Framework, GPT-5.6 Sol remains below the 'Cyber Critical' threshold. While the model can identify exploitation primitives and bugs in browsers like Chromium and Firefox, it did not autonomously produce functional full-chain exploits during testing.