Agent Frameworks Unlock 10x Research Speed
Luo Fuli discovered OpenClaw, an open-source AI agent framework by Peter Steinberger, during 2026 Lunar New Year, spending 2 AM to 6 AM chatting with it via Claude Opus 4.6 and burning $1,000 on day one. Its "finely orchestrated context"—appending current time per conversation and search.md configs—felt soulful, handling daily tasks, team curiosity strategies, and even research handoffs. She redesigned its memory and multi-agent systems, then tested on weaker models like Sonnet, domestic LLMs, MiMo-V2-Pro (training then), MiMo-V2-Flash, and a 3B on-device model. These small models achieved impossible feats under OpenClaw, compensating for intrinsic weaknesses. Mandating 100+ conversation turns (unenforced) sparked group chats on Mac Minis; 100+ people hit 999+ unread messages in 10 minutes post-holiday, shifting mindsets to mutual enhancement of agents and models. Result: 3-4 weeks for research that took 30-40 weeks before.
Flat Structure Maximizes Creativity Over Experience
Xiaomi MiMo's 100-person team has no job titles, sub-teams, or strict deadlines, even for 1T-parameter models—challenging big tech norms. Luo prioritizes "initial checkpoint" potential over experience, hiring college sophomores/juniors (high intern ratio) who adapt in 1-4 months in the right environment. Only 20-40 people iterate the core pipeline; most had no large-model experience pre-hire (maybe 7B/14B). Egalitarianism fosters diverse perspectives vital for post-training shifts, avoiding rigid divisions that kill multi-interest creativity. Passion drives via hands-on exposure; group chats multiply imagination. Debugging 1T training instabilities (loss spikes, expert imbalances in MoE)—a "dark art" wasting millions daily on idle GPUs—succeeds via small agile teams without deadline pressure, though self-doubt persists.
Architecture and Pricing for Agent Era, Plus AGI Bet
MiMo-V2 chose MTP (multi-token prediction) and Hybrid Attention (sliding windows for linear KV cache reduction + MTP fill) over MLA, as MLA's compute-memory equilibrium blocks further gains and hurts long-context reasoning for agents. Hybrid hits 100-150 TPS (Flash) and 60-100 TPS (Pro), enabling 1M-100M contexts. Pricing shifted from cost-based (0.3 RMB/million output tokens for Flash) to value-based, reflecting post-training optimizations for agents. Multimodality skepticism: MiMo V2 Omni (smaller than Pro) excels in perception/EQ despite flat benchmarks; agents orchestrate models without needing intrinsic multimodal boosts. AGI: 20% done, 60-70% this year, full in 2 years via agents absorbing human intelligence then self-bootstrapping—systems > geniuses.