Democratizing Robotics through Accessibility
Hugging Face developed the Reachy Mini to address the high barrier to entry in robotics, where most platforms are prohibitively expensive ($50k+) and designed as rigid humanoid imitations. By shipping the robot unassembled for $300–$450, the project prioritizes repairability, hackability, and creative exploration over mimicking human biology. The platform is designed to be an expressive, non-humanoid interface that encourages users to experiment with new interaction patterns, such as 3D-printing custom parts or creating novel behaviors like "purring" when petted.
Optimizing Voice AI for Real-Time Interaction
To enable fluid human-robot conversation, the team built a modular voice stack consisting of Parakeet for transcription, Qwen 3.5 27B for reasoning, and an optimized Qwen3-TTS. A critical challenge was the original Qwen3-TTS implementation, which failed to meet the latency requirements for real-time agents.
Andres Marafioti achieved a 5.8x real-time performance boost (reducing time-to-first-audio to under 200ms) by addressing three primary bottlenecks:
- Streaming: Implementing streaming to avoid waiting for the full audio generation.
- CPU-GPU Overhead: Eliminating the constant data transfer between CPU and GPU by compiling the model for direct GPU execution.
- KV Cache Management: Replacing the dynamic KV cache with a static one, which allowed for the use of CUDA graph captures to accelerate generation.
Infrastructure and Deployment Strategy
The team manages a large fleet of robots using a load-balanced architecture that separates LLM inference from conversation nodes. This separation is vital because conversation nodes handle varying user activity levels; by decoupling the LLM endpoints, the system can scale compute resources efficiently based on actual conversation volume rather than just the number of connected devices. The system also performs echo cancellation and partial transcription (every 150ms) to allow the robot to react mid-sentence, ensuring a responsive, natural user experience.