DeepSeek V4 Tests: 3D Code Strong, SVG & QA Weak

DeepSeek's likely V4 model in Expert mode builds usable 3D floor plans and Pokeballs via Three.js but fails on panda SVGs, chess autoplay, butterfly scenes, and simple QA where it stalls midway.

Expert Mode Delivers Bigger Outputs but Limits Concurrency

DeepSeek's new interface offers two modes: Expert for the most powerful generations (likely V4) and Instant for image prompts and multimodal tasks. Expert mode processes one prompt at a time without parallel threads, ensuring focused compute on complex requests. Attach images automatically switches to Instant, confirming multimodal support. Use Expert for single, high-fidelity code outputs like full HTML files with Three.js; avoid it for batch testing due to the one-at-a-time restriction.

3D Generation Succeeds on Practical Layouts and Objects

For a 1585 square foot 3D floor plan with two rooms and two washrooms, Expert mode outputs a single runnable HTML file using HTML, CSS, JS, and Three.js. The result shows accurate layout: visible bathrooms and bedrooms, fully navigable and usable. Similarly, a Three.js Pokeball generates a polished, dark-blue tinted sphere matching refined styles like GPT-4o. These tests prove DeepSeek V4 handles interactive 3D architecture and object modeling reliably—copy the HTML, open in a browser, and interact immediately without tweaks.

Creative SVGs, Complex Scenes, and Functionality Fall Short

SVG panda holding a burger produces disproportionate hands and low overall quality, lacking polish. A 3D chessboard with all pieces and autoplay for legal moves looks visually impressive but autoplay fails entirely—pieces render but no opponent simulation or win detection works. Majestic 3D butterfly in a blue garden with camera controls resembles a distorted character (like Gardevoir) more than an insect; basic movement functions but lacks detail and accuracy. Trade-off: Strong visuals don't guarantee working interactions; test functionality post-generation.

Reasoning Stalls on Simple QA, Hinting at Scale Limits

Basic question-answering gets stuck midway, failing to complete responses—issues may resolve in API versions but expose current web interface limits. Overall, V4 shows promise over prior models but trails DeepSeek R1 in size and consistency; wait for full release before production use. Prioritize it for 3D code prototypes where it outperforms on usability.

Video description
In this video, I'll be talking about DeepSeek's newly rolled-out model and updated interface, which many people believe could be DeepSeek V4. I tested it across several coding, SVG, 3D, and reasoning tasks to see how well it performs and whether it actually lives up to the hype. -- Key Takeaways: 🚀 DeepSeek appears to be rolling out a brand-new model and interface, and it may be DeepSeek V4. 🧠 The new Expert mode seems to be the more powerful option, while Instant mode handles image prompts and multimodal tasks. 🏠 DeepSeek performed well on some generation tests, especially the 3D floor plan and the Three.js Pokeball. 🎨 Some creative outputs, like the panda SVG and butterfly scene, were noticeably weaker and had quality issues. ♟️ The chess board demo looked visually impressive, but the autoplay feature did not work properly. 🌲 The 3D Minecraft-style demo was promising, although the controls did not function correctly. 📉 On simpler question-answering tests, the model sometimes got stuck midway, showing that it still has limitations. 👍 Overall, the update looks promising, but it may not be as large or as strong as DeepSeek R1.

Summarized by x-ai/grok-4.1-fast via openrouter

4281 input / 1230 output tokens in 11154ms

© 2026 Edge