Model Architecture and Design

Kimi K2.7-Code is a 1T parameter Mixture-of-Experts (MoE) model designed specifically for agentic software engineering. It activates 32B parameters per token, utilizing a structure of 384 experts (8 selected per token plus 1 shared) across 61 layers. The model incorporates a 400M parameter MoonViT vision encoder for multimodal input and supports a 256K token context window. It is built for server-class deployment, with weights available on Hugging Face under a Modified MIT license.

Performance and Reasoning Efficiency

Moonshot reports significant gains over the previous K2.6 version, most notably a 21.8% improvement on the Kimi Code Bench v2 (rising from 50.9 to 62.0). The model also demonstrates competitive performance against frontier models like Claude Opus 4.8, particularly on the MCP Mark Verified benchmark (81.1 vs 76.4).

A core advancement is the 30% reduction in reasoning-token usage. Because agentic coding workflows involve iterative planning, debugging, and tool execution, this efficiency gain directly translates to lower operational costs, faster step execution in CLI environments, and a greater capacity for complex tasks before reaching context limits.

Operational Constraints

K2.7-Code requires specific configurations for optimal use:

  • Mandatory Thinking: The model's 'thinking' mode cannot be disabled.
  • Fixed Sampling: Users must adhere to fixed parameters (temperature 1.0, top_p 0.95, n=1, penalties 0.0). Deviating from these settings will result in API errors.
  • Tooling: When using tool-calling features, developers must preserve the reasoning_content in the message history to prevent errors in subsequent turns.