The GPT-5.6 Model Series

OpenAI has introduced three new models in the GPT-5.6 series, each optimized for different use cases:

  • Sol: The flagship model, featuring a new max reasoning effort mode for deep analysis and an ultra mode that utilizes subagents to coordinate complex, multi-step tasks.
  • Terra: A balanced model designed for everyday professional workflows, offering performance competitive with GPT-5.5 at 50% of the cost.
  • Luna: An efficient, low-cost model optimized for speed and affordability.

Performance and Benchmarks

The series demonstrates significant gains in agentic workflows, particularly in technical domains:

  • Coding: GPT-5.6 Sol sets a new state-of-the-art on Terminal-Bench 2.1, which evaluates command-line planning and tool coordination.
  • Biology: On GeneBench v1, the model shows improved performance in long-horizon genomics and quantitative analysis while consuming fewer tokens.
  • Cybersecurity: The models show improved efficiency in vulnerability research. On ExploitBench, Sol matches the performance of the Mythos Preview while using only 33% of the output tokens. All three models show improved cyber capabilities on the ExploitGym benchmark as reasoning effort increases.

Layered Safety and Deployment

GPT-5.6 launches with a multi-layered safety architecture designed to mitigate misuse while supporting defensive security work. Key components include:

  • Model-Level Safeguards: Training to refuse prohibited cyber assistance, even when users attempt jailbreaking or intent-masking.
  • Real-Time Classifiers: A secondary layer that monitors output generation. High-risk requests trigger a pause, where a larger reasoning model reviews the context before allowing or withholding the output.
  • Phased Release: The models are currently in a limited preview with trusted partners, coordinated with the U.S. government. OpenAI explicitly states this is a short-term measure and does not intend for government-gated access to become the long-term standard for model releases.

Cyber Preparedness

According to OpenAI's internal Preparedness Framework, GPT-5.6 Sol remains below the 'Cyber Critical' threshold. While the model can identify exploitation primitives and bugs in browsers like Chromium and Firefox, it did not autonomously produce functional full-chain exploits during testing.