Gemini Robotics-ER 1.6 Sharpens Robot Planning and Perception
DeepMind's Gemini Robotics-ER 1.6 outperforms prior models in object pointing, counting, and task success recognition, while enabling robots to read instruments like pressure gauges via agentic image processing and code execution.
Outperforms Priors in Essential Robotics Tasks
Gemini Robotics-ER 1.6 serves as a high-level reasoning layer for robots, processing surroundings to plan tasks autonomously and calling external tools like Google Search or vision-language-action models as needed. It surpasses Gemini Robotics-ER 1.5 and Gemini 3.0 Flash specifically in pointing to objects, counting items, and detecting successful task completion—core skills for reliable robot operation in dynamic environments. These gains let robots handle perception-heavy workflows without constant human oversight, reducing errors in real-world deployment.
Instrument Reading via Agentic Processing and Code
For reading analog instruments like pressure gauges and sight glasses, the model combines agentic image processing with code execution: it zooms into fine details on displays, applies pointing functions to measure proportions, calculates scale and distances programmatically, and interprets results using embedded world knowledge. Developed in collaboration with Boston Dynamics, this capability powers their Spot robot for autonomous system inspections, turning imprecise visual data into actionable metrics without specialized hardware.
Immediate Access for Robot Builders
Integrate via the Gemini API or Google AI Studio, with a ready Colab notebook demonstrating setup and usage. Start prompting the preview model directly to test planning and perception in your robotics prototypes, accelerating from demo to production without custom training.