Full-Duplex AI Responds in 0.40s Like Human Speech

Rethinking AI Interaction as Full Duplex

Current LLMs operate turn-based: user speaks, AI listens fully before responding, like texting. Thinking Machines Lab, founded in 2024 by ex-OpenAI CTO Mira Murati, introduces "interaction models" for full-duplex audio, where AI processes input and generates output simultaneously—like a phone call. This allows interruptions and natural flow, making conversations feel human rather than scripted.

The core model, TML-Interaction-Small, achieves 0.40-second response times, matching natural human speech latency and outperforming comparable OpenAI and Google models. Benchmarks on their site show superior speed without sacrificing quality, proving interactivity can be native to the model architecture, not an add-on.

Benchmarks Validate Speed Gains

Thinking Machines claims impressive metrics: TML-Interaction-Small responds in 0.40 seconds end-to-end, significantly faster than competitors. This full-duplex setup processes streaming audio in real-time, enabling the AI to react mid-sentence if needed. While benchmarks look strong, they remain untested in broad real-world use—success hinges on whether the experience delivers on these technical promises.

Path to Production

This is a research preview, not a public product. A limited research preview launches in the next few months, with wider release later in 2025. Builders can anticipate integrating this for voice agents or apps needing fluid dialogue, but evaluate trade-offs like compute costs and error handling in live interruptions once available.

Rethinking AI Interaction as Full Duplex

Benchmarks Validate Speed Gains

Path to Production

More from AI & LLMs

AI Labs Bet Big on Custom Enterprise Services

LMSYS Leaderboards Don't Predict Real LLM Performance

Index Rule Changes Boost SpaceX/OpenAI IPOs at Passive Investors' Cost

Anthropic Data: AI Tasks Jobs, Not Replaces Them—Yet