Xiaomi's 1T MoE AI Tops Charts at $1/M Tokens
Xiaomi's Mio V2 Pro (1T params, 42B active) hits global top 10 with SWE-bench 78%, Clawal 61.5 at $1 input/$3 output per M tokens—100x cheaper than Claude—excelling in creative/coding tasks but weak on frontier math.
Xiaomi Mio V2 Pro Delivers Frontier Performance at Fraction of Cost
Xiaomi's 1T+ parameter MoE model (42B active per request, hybrid 7:1 attention, 1M token context, multi-token prediction) stealth-launched as Hunter Alpha, topping OpenRouter usage and boosting stock 6%. Benchmarks: 8th on Artificial Analysis Intelligence Index (2nd Chinese), SWE-bench Verified 78% (near Claude 3.5 Sonnet 79.6%/Opus 80.8%), Clawal 61.5 (vs Opus 66.3), Pinchbench 81.0 (tied top-tier). Pricing: $1/M input, $3/M output tokens—vs Claude Sonnet $3.15/M input/$15/M output, Opus $5.25/$15 (changes agent economics). Strengths: Generated 3K-word Mesoamerican story with natural dialogue/emotional arc; full 2.5D stealth game + sound/MIDI from one prompt; OpenClaw agent env in 1-click (30min sessions). Weaknesses: Reframed legal contradictions without flagging; failed frontier math despite step-by-step (token burn/freeze). Mio V2 Omni adds native end-to-end vision/audio/video (strong dashcam analysis). Closed-source now, open hints later.
Mistral Voxil TTS Enables Fast, Local Voice Cloning
4B param Voxil splits text processing, phoneme shaping, audio synthesis for 70ms latency (500-char input → 10s audio) and 9.7x realtime speed across 9 languages (English-French-German-Spanish-Dutch-Portuguese-Italian-Hindi-Arabic), capturing dialects/rhythm. Clones voices from 3s audio, cross-language. Beats 11 Labs Flash 2.5 (68.4% human pref multilingual), matches V3 on similarity. CC-BY-NC open weights; quantized for smartphones/laptops—enables offline/private tools vs cloud APIs. Fills Mistral's audio pipeline gap for expressive, real-time assistants.
Nvidia ProRL Splits Agent Training for 2x SWE Gains
ProRL separates agent task execution (as independent service) from training, plus task prep/execution/eval stages, cutting terminal delays ~50%, direct comms, consistent state (no reprocessing), intelligent inference scaling. Gains: Qwen model 9.6→18.0% SWE-bench Verified; larger 15.4→23.6%—from design, not model size. Runs on shared clusters sans admin—scales labs/companies for complex agent training.