Automating Android Tasks with Gemini 3.5 Flash Computer Use

The Computer Use Loop

Gemini 3.5 Flash implements 'Computer Use' as a native tool, enabling the model to interact with graphical interfaces by observing screenshots and outputting structured function calls. The interaction follows a continuous feedback loop: the model receives a screenshot and a goal, returns a function call (e.g., click, type, swipe), the bridge executes that action on the device via Android Debug Bridge (ADB), and the updated state is captured in a new screenshot to be sent back to the model.

Implementing the Bridge

To enable this, you must build a bridge that translates the model's normalized coordinate system (a 0-999 grid) into the specific pixel resolution of the target device. The implementation requires handling several core actions:

Navigation: click, long_press, scroll, and go_back.
Input: type (text input) and press_key (system keys like home or back).
State Management: open_app and list_apps to manage the device environment.

The provided implementation uses a Python ADBBridge class that wraps adb shell commands. For production, the author notes that this synchronous approach should be replaced with robust error handling, asynchronous execution, and specific logic to handle safety_decision flags, which the model may trigger for sensitive actions like payments or state changes.

Platform Agnostic Control

While the current implementation focuses on Android via ADB, the Gemini API's mobile environment is platform-agnostic. The model's output remains consistent regardless of the underlying OS. To extend this to iOS, developers can replace the ADBBridge with tools like simctl for simulators or go-ios for physical devices, maintaining the same core agent loop logic.

The Computer Use Loop

Implementing the Bridge

Platform Agnostic Control

More from Agents & Orchestration

Building a Personal AI Research OS

Recursive Coding Agents: Managing AI Geniuses

Designing Agentic Loops with Claude Code

How the Model Context Protocol (MCP) Standardizes AI Integration