#multimodal
Every summary, chronological. Filter by category, tag, or source from the rail.
Tag · #multimodal
Visual Primitives Solve LMM Reference Gap
DeepSeek's withdrawn paper introduces 'Thinking with Visual Primitives'—embedding bounding boxes and points into every reasoning step—to fix ambiguous referencing in multimodal models, achieving 77.2% on spatial benchmarks with 10x fewer tokens than rivals.
Data and Beyond
Showing 2 of 2