Kimi K2.6: Open MoE Model Tops Agentic Coding Benchmarks
Moonshot's 1T-param MoE Kimi K2.6 open-sources native multimodal agents that excel at 13-hour autonomous coding (185% throughput gains) and scale to 300 sub-agents over 4,000 steps, deployable via vLLM.
Efficiently Deploy 1T-Param Multimodal MoE for Agentic Workloads
Kimi K2.6 activates only 32B of its 1T parameters per token using a Mixture-of-Experts architecture with 384 experts (8 routed per token plus 1 shared), 61 layers, and Multi-head Latent Attention. Native vision via 400M-param MoonViT handles images/videos alongside 256K-token context and 160K vocab. Run on vLLM, SGLang, or KTransformers with transformers >=4.57.1 <5.0.0; reuse K2.5 configs. API offers Thinking mode (temp=1.0, chain-of-thought for coding/agents, enable preserve_thinking for multi-turn state) or Instant mode (disable via {'thinking': {'type': 'disabled'}} or chat_template_kwargs {"thinking": False}, temp=0.6 top-p=0.95). Weights on Hugging Face under Modified MIT.
Leads HLE-Full with tools at 54.0 (vs GPT-5.4 52.1, Claude Opus 4.6 53.0), SWE-Bench Pro 58.6 (vs GPT-5.4 57.7), Terminal-Bench 2.0 66.7, LiveCodeBench v6 89.6, BrowseComp swarm 86.3, DeepSearchQA 92.5 f1.
Execute 13-Hour Autonomous Coding with 4K+ Tool Calls
For long-horizon tasks, K2.6 resolves GitHub issues and optimizes code over hours without oversight. Example: Downloaded/deployed Qwen3.5-0.8B on Mac, implemented Zig inference, iterated 14x over 12+ hours/4K tool calls to boost throughput 15→193 tokens/sec (20% > LM Studio). Another: Refactored 8-year exchange-core engine (1K+ tool calls, 13 hours, 4K+ LOC changes); analyzed flame graphs, shifted topology 4ME+2RE→2ME+1RE for 185% medium throughput (0.43→1.24 MT/s) and 133% perf (1.23→2.86 MT/s). Internal Kimi Code Bench and Claw Bench show gains in coding, research, scheduling, memory over K2.5; RL team ran 5-day proactive agent for ops.
Scale Complex Tasks via 300-Sub-Agent Swarms and Claw Groups
Agent Swarm decomposes tasks across 300 heterogeneous sub-agents/4K steps (up from K2.5's 100/1.5K), parallelizing search/research/writing/generation for docs/sites/slides/spreadsheets. Skills feature ingests PDF/spreadsheet/slide/Word to extract structure/style as reusable templates—e.g., CV→100 customized resumes; 30 LA stores→landing pages; astrophysics paper→40-page report +20K-entry dataset/14 charts. Claw Groups (preview) coordinates external agents/humans: K2.6 matches tasks by skills/tools, reassigns on failure, manages lifecycle. Used internally for content/Demo/benchmark/social/video production. Build persistent agents like OpenClaw/Hermes for 24/7 cross-app ops.