Flagship Text Models for Coding and Agents
MiniMax-M2.7 enables model self-iteration for complex tasks; pair with M2.7-highspeed for unchanged quality at higher speeds. M2.5 delivers top performance and cost-efficiency on intricate workloads, with M2.5-highspeed variant boosting velocity. M2-her specializes in role-playing and multi-turn dialogues. Older models like M2.1 excel in multilingual coding and agent workflows, accessible via Anthropic-compatible APIs—integrate directly into production pipelines for efficient text generation without custom prompt tweaks.
Speech Models for Natural Voice Output
Use Speech-2.8-HD to replicate real tones and timbre precisely; Speech-2.8-Turbo prioritizes speed with vivid expression. Speech-2.6-HD offers superior audio quality and rhythm at faster rates, while Speech-2.6-Turbo cuts latency for responsive apps. Legacy Speech-02-HD shines in prosody and stability for high-fidelity cloning; Speech-02-Turbo enhances small-language support—deploy these for low-delay TTS in voice agents or interactive apps, balancing quality and real-time needs.
Video, Image, and Music Generation
Hailuo 2.3 breaks through in motion, expressions, physics, and prompt adherence for text-to-video; Hailuo 2.3-Fast accelerates image-to-video with strong fidelity at lower cost. Hailuo 02 generates native 1080p videos with state-of-the-art physics. Image-01 handles detailed T2I/I2I; image-01-live boosts hand-drawn/cartoon styles. Music-2.5+ unlocks pure instrumentals and genre fusion; music-2.5 masters detailed orchestration—leverage for creative apps, starting with token plans for multimodal builds.