Massive Scale and Efficiency Architecture

DeepSeek V4 Pro totals 1.6T parameters (49B active) and Flash 284B (13B active), both Mixture-of-Experts with 1M-token context under MIT license—making Pro the largest open-weights model, surpassing Kimi K2.6 (1.1T) and GLM-5.1 (754B). Pro weighs 865GB on Hugging Face; Flash 160GB, potentially runnable quantized on 128GB hardware by streaming active experts. Efficiency shines in long contexts: Pro uses 27% of V3.2's single-token FP8 FLOPs and 10% KV cache at 1M tokens; Flash hits 10% FLOPs and 7% KV cache, enabling low pricing without performance loss.

Creative Output Matches Visual Quality of Prior DeepSeeks

Test SVG generation of 'pelican riding a bicycle' via OpenRouter shows solid results: Flash produces excellent bike frame, chain, reflector, with mean-faced pelican gripping handlebars; Pro delivers capable bike (jagged spokes aside) but distorted pelican body. Both improve coherence over V3.2, V3.1, and V3-0324 equivalents, proving practical capability for generative tasks at scale.

Lowest Pricing Among Peers

DeepSeek undercuts frontier models: Flash at $0.14/M input, $0.28/M output beats GPT-5.4 Nano ($0.20/$1.25) and Gemini 3.1 Flash-Lite ($0.25/$1.50); Pro at $1.74/$3.48 undercuts Gemini 3.1 Pro ($2/$12), GPT-5.4 ($2.50/$15), Claude Sonnet 4.6 ($3/$15). Use llm install llm-openrouter; llm openrouter refresh; llm -m openrouter/deepseek/deepseek-v4-pro 'prompt' for instant access. Await Unsloth quantizations for local runs.

Benchmarks Trail Leaders by 3-6 Months

Self-reported scores position V4-Pro-Max ahead of GPT-5.2 and Gemini-3.0-Pro on reasoning via expanded tokens, but 3-6 months behind GPT-5.4 and Gemini-3.1-Pro—near-frontier at fraction of cost.