OpenAI Realtime API GA: 128K Voice Agents + Translate/STT

GPT-Realtime-2 Enables Natural Multi-Step Voice Agents

Use GPT-Realtime-2 for voice agents that reason like GPT-5, process 128K token context (4x prior 32K), handle interruptions, and maintain long conversations without stalling. Enable preamble phrases like "let me check that" to fill silence during tool calls or multi-step tasks—users hear narration instead of dead air, fixing common production failure modes.

Tune reasoning across five levels (minimal, low, medium, high, xhigh; default low for low latency) to balance speed and depth: quick lookups stay fast, complex bookings get full compute. Adjust tone dynamically—calm for troubleshooting, empathetic for frustration, upbeat post-resolution. It grasps industry terms like healthcare vocab.

Benchmarks prove gains: high reasoning hits 96.6% on Big Bench Audio (vs 81.4% GPT-Realtime-1.5, +15.2 points) for audio reasoning; xhigh scores 48.5% on Audio MultiChallenge (vs 34.7%) for multi-turn dialogue, instruction following, and corrections. Pricing: $32/1M input tokens ($0.40 cached), $64/1M output.

Dedicated Pipes for Translation and Streaming Transcription

Pipe speech through GPT-Realtime-Translate for live translation from 70+ input languages to 13 outputs at speaker pace—ideal for bilingual support or events, but lacks agent reasoning (use GPT-Realtime-2 for that). Costs $0.034/min.

Stream transcripts in real-time with GPT-Realtime-Whisper: tune latency for partial text (low delay) or higher quality (more delay), beating batch Whisper for live captions, meeting notes, or continuous agent input. At $0.017/min, it makes voice apps feel responsive.

Production Setup: Session Types and Controls

Select voice-agent (reasoning responses), translation (language pipe), or transcription (STT only) sessions. New voices Cedar/Marin available. API now generally available—test in Playground, deploy without beta risks. Full details: OpenAI announcement.

GPT-Realtime-2 Enables Natural Multi-Step Voice Agents

Dedicated Pipes for Translation and Streaming Transcription

Production Setup: Session Types and Controls

More from AI News & Trends

GPT-Realtime-2 Brings GPT-5 Reasoning to Voice Agents

Anthropic's Compute Deal and Agents Challenge OpenAI

OpenAI's Realtime Voice Models Enable GPT-5 Reasoning Live

Anthropic's 10 Finance Agents Accelerate Enterprise AI Adoption