Qwen3.6-35B-A3B: 3B Active Params Rival 30B Dense Models

Qwen3.6-35B-A3B uses sparse MoE to activate only 3B of 35B params, delivering top agentic coding scores like 73.4 on SWE-bench and 51.5 on Terminal-bench while handling vision tasks at 81.7 MMMU.

Sparse MoE Cuts Inference Costs While Matching Dense Giants

Mixture of Experts (MoE) routes each token to 8 specialized experts plus 1 shared expert out of 256 total, activating just 3B params from 35B during inference. This keeps compute, latency, and costs tied to active params, not total size—ideal for scaling agentic apps without 10x hardware. Architecture stacks 10 blocks of 3x (Gated DeltaNet → MoE) for cheap linear attention, plus 1x (Gated Attention → MoE) using Grouped Query Attention (16 query heads, 2 KV heads) to slash KV-cache memory. Native 262k context extends to 1M+ tokens via YaRN, enabling long agent traces without overflow.

Agentic Coding Beats Larger Models on Real Tasks

For GitHub issue resolution, score 73.4 on SWE-bench Verified (vs 70.0 prior Qwen3.5-35B-A3B, 52.0 Gemma4-31B). On Terminal-bench 2.0 (3-hour real terminal tasks), hit 51.5—highest vs Qwen3.5-27B (41.6), Gemma4-31B (42.9), Qwen3.5-35B-A3B (40.5). Frontend shines on QwenWebBench (Web design/apps/games/SVG/viz/anim/3D): 1397 points vs 1068 Qwen3.5-27B, 978 prior A3B. Use for autonomous code agents; efficiency lets you run locally or cheap cloud without dense-model bills.

Multimodal Vision Handles Images, Docs, Video

Vision encoder processes images/documents/video/spatial data. MMMU (multi-discipline image reasoning): 81.7 beats Claude 3.5 Sonnet (79.6), Gemma4-31B (80.4). RealWorldQA (photo contexts): 85.3 tops Qwen3.5-27B (83.7), crushes Claude (70.3)/Gemma (72.3). ODInW13 object detection: 50.8 (up from 42.6 prior). VideoMMMU: 83.7 over Claude (77.6)/Gemma (81.6). Pair with RAG pipelines for visual agents analyzing screenshots/charts.

Thinking Mode Controls Reasoning for Agents

Default thinking mode wraps chain-of-thought in tags; disable via API ("enable_thinking": False) for direct outputs—cuts latency vs inline /think (unsupported from Qwen3). Enable preserve_thinking to retain historical blocks, boosting agent consistency, reducing recompute, and optimizing KV cache over long sessions. Apache 2.0 weights on Hugging Face; see Qwen blog for full integration.

Summarized by x-ai/grok-4.1-fast via openrouter

6347 input / 2239 output tokens in 14662ms

© 2026 Edge