DeepSeek V4: Open 1.6T Model Beats Closed SOTA on Agents

Unmatched Efficiency for Massive Scale

DeepSeek V4 launches two open-weights models—1.6 trillion parameters (Pro) and 284 billion (Flash)—both trained on 32-33 trillion tokens with 1 million context windows. The Pro uses just 27% of the FLOPs of DeepSeek V3.2 for 1M context and 10% for KV cache, making it one of the most efficient large models available; Flash is even leaner at 10% FLOPs and 7% KV cache versus V3.2 despite being one-third its size. This slashes inference costs and speeds up runs, validated on Nvidia GPUs and Havi Ascent NPUs. Open base weights enable easy fine-tuning, closing the open-source gap to closed models by 3-6 months.

Pricing undercuts Western competitors: Pro at $0.15/M input tokens (cache hit), $1.75 (miss), $3.50-$4/M output. Capacity limits Pro service now, but scales post-950 super nodes launch later this year for lower prices. Test free on DeepSci playground.

Agentic Strengths Outshine Knowledge Benchmarks

Pro matches or exceeds closed SOTA like Gemini 3.1 Pro or o1 on agentic tasks, its standout area, while lagging slightly on knowledge/reasoning (e.g., behind Gemini 3.1 Pro on SimpleQA Verified). Flash holds strong on agents too, nearing Pro. Use Pro for implementation after planning with o1/Claude Opus, leveraging its speed and cost. Benchmarks split into knowledge/reasoning and agents highlight this split—test your own data, as DeepSeek stays transparent.

Architectural wins like compressed sparse attention cut KV cache memory, boosting long-context agentic flows. No native agent harness yet, but integrates with CloudCode, OpenClaw, or OpenCode for interleaved tools.

Delivers Functional Outputs with Detailed Chain-of-Thought

Prompts trigger verbose chain-of-thought (token-heavy, 2-4 min thinking), enabling backtracking and planning for complex tasks. Detailed instructions yield precise results; vague ones produce slop.

Examples:

Website with toggle, animations: Fully functional HTML/CSS/JS, minor hover bugs but follows specs closely.
Procedural pagoda garden in Three.js: Builds progressively, functional despite basic design.
Real-time ISS tracker: Fetches API every 5s, renders accurate Earth/continents, shows lat/long, zoom, sun position, next-update timer (minor coord/API glitches).

Inference is fast post-thinking, but tab-switching pauses generation (possible bug). Strong for agentic coding without harness—chain-of-thought mimics reasoning, ideal for production tools.