Gemini Robotics Powers Generalist Physical Agents

Dual-Model System for Vision, Reasoning, and Action

Gemini Robotics uses a VLA model (Gemini Robotics 1.5) that processes visual inputs, language instructions, and outputs motor commands for tasks, paired with an embodied reasoning model (Gemini Robotics-ER 1.5) for high-level planning and logical decisions without direct control. A lightweight on-device VLA variant runs locally on robots, allowing developers to fine-tune for custom applications. This setup lets a single model adapt to diverse robot forms, transferring skills across static bi-arm platforms (ALOHA, Bi-arm Franka) and humanoids (Apptronik Apollo), accelerating learning without embodiment-specific retraining.

Capabilities Enabling Complex Real-World Tasks

Robots powered by these models generalize to novel situations by breaking goals into steps, handling multi-step tasks autonomously, and recovering from interruptions. Agentic behavior includes calling tools like Google Search for info during planning. They exhibit 'thinking before acting' via natural language explanations, respond to conversational redirects without technical jargon, and perform dexterous manipulations like folding origami, packing lunchboxes, or salad prep. Dynamic interactivity adapts to environmental changes or user inputs mid-task, supporting tasks like agentic tool use, embodied reasoning in new scenes, and cross-embodiment motion transfer.

Developer Access and Responsible Deployment

Access Gemini Robotics-ER 1.5 preview in Google AI Studio; join waitlist for full SDK to integrate with custom robots. Google DeepMind Accelerator supports early-stage startups building physical AI with these models. Safety integrates proactive safeguards, expert collaborations, and a Responsibility and Safety Council to mitigate risks in real-world deployment.