Visual Primitives Solve LMM Reference Gap
DeepSeek's withdrawn paper introduces 'Thinking with Visual Primitives'—embedding bounding boxes and points into every reasoning step—to fix ambiguous referencing in multimodal models, achieving 77.2% on spatial benchmarks with 10x fewer tokens than rivals.






