OpenAI's Realtime Voice Models Enable GPT-5 Reasoning Live

Build Voice Agents with GPT-5 Reasoning at Low Latency

OpenAI's GPT-Realtime-2 handles complex live voice tasks—tracking context, tool calls, interruptions—while matching GPT-5 reasoning. Expand context from 32k to 128k tokens for longer dialogues. Use parallel tool calls with audible feedback like 'let me check that' or preambles ('one moment') to buy thinking time without silence. Adjust reasoning via five levels (minimal to xhigh; default low for speed), enabling calm tones for problem-solving or empathy for frustrated users. It excels on specialized terms (medical, proper names). Benchmarks show gains: 96.6% accuracy on Big Bench Audio at high (vs 81.4% prior), 48.5% pass rate on Audio MultiChallenge at xhigh (vs 34.7%). Beats GPT-Realtime-1.5 overall for reliable production agents.

Three Patterns for Voice-Driven Products

Combine models into patterns for real-world apps. Voice-to-Action: Speak requests; AI reasons, tools, executes (e.g., bookings). Systems-to-Voice: Apps speak contextual guidance (e.g., travel app reroutes post-delay, confirms luggage). Voice-to-Voice: Cross-language talks (Deutsche Telekom tests for support). These roll out soon to ChatGPT audio, positioning voice as primary UI for customer support, sales, education.

Add Translation and Transcription for Workflows

GPT-Realtime-Translate covers 70+ input/13 output languages, preserving meaning amid accents or switches—ideal for global support/events. GPT-Realtime-Whisper streams low-latency captions for meetings/classrooms, generating live notes/summaries to speed healthcare/recruiting follow-ups. All live now in Realtime API (EU residency supported) and Playground; test combinations for hybrid agents.

Token/Minute Pricing for Scalable Deployment

GPT-Realtime-2: $32/M audio input tokens ($0.40 cached), $64/M output. Translate: $0.034/min. Whisper: $0.017/min. Enterprise privacy applies; low costs suit high-volume voice products over text-only.

Build Voice Agents with GPT-5 Reasoning at Low Latency

Three Patterns for Voice-Driven Products

Add Translation and Transcription for Workflows

Token/Minute Pricing for Scalable Deployment

More from AI News & Trends

GPT-Realtime-2 Brings GPT-5 Reasoning to Voice Agents

OpenAI Realtime API GA: 128K Voice Agents + Translate/STT

Anthropic's Compute Deal and Agents Challenge OpenAI

Anthropic's 10 Finance Agents Accelerate Enterprise AI Adoption