Deploy ADK Multimodal Agent with Gemini 3.1 on Lightsail

Streamlined Environment Setup for Reproducible Builds

Install pyenv to manage Python 3.13 versions across platforms, ensuring consistent ML/AI library support: git clone https://github.com/pyenv/pyenv. Use nvm for Node.js: git clone https://github.com/nvm-sh/nvm. Install Gemini CLI via npm install -g @google/gemini-cli and authenticate with Google account or API key. Clone the gemini31-lightsail repo from https://github.com/xbill9/gemini-cli-aws, then source init.sh (or set_env.sh for re-auth) to configure PROJECT_ID and other vars. This creates a minimal viable setup for ADK agents using Gemini Live API, avoiding version conflicts that plague Python deployments.

Build frontend with make frontend (uses Vite: npm install && npm run build), producing dist/assets/index-*.js (214 kB) and CSS (21 kB). Test mock UI server via make mock at http://127.0.0.1:8080/ to validate browser multimedia without model calls.

Local Testing Validates Multimodal Capabilities

Verify ADK install with make testadk: runs biometric_agent CLI (adk run biometric_agent), responds to 'hello' with 'Scanner Online'. Test full web interface via make adk at http://127.0.0.0:8000/ (add --allow_origins 'regex:.*' for Cloud Shell CORS). Lint with make lint (Ruff checks 10 files, ESLint frontend). Run pytest via make test: 8 tests pass in 2.59s (biometric_agent, live_connection, ws_backend_v2).

Launch full app with make run (sources biosync.sh, 2.0 FPS, 10s heartbeat) at http://127.0.0.1:8080/, serving static files from frontend/dist. This confirms real-time audio/video streaming with client-side Worklet for off-main-thread processing, raw binary streams (no JSON wrapper overhead), and CLI detection to skip Live model errors.

One-Command Lightsail Deployment and Gemini 3.1 Adaptations

Deploy via make deploy (runs save-aws-creds.sh, deploy-lightsail.sh): creates container service visible in Lightsail console (https://lightsail.aws.amazon.com/ls/webapp/home/containers). Check make status (ACTIVE/DEPLOYING), get endpoint with make endpoint (e.g., https://biometric-scout-service.6wpv8vensby5c.us-east-1.cs.amazonlightsail.com/). Access UI for live multimodal interactions: audio/video processed by Gemini 3.1 Flash Live.

Key upgrades from original Google codelab: Switch Vertex AI (PROJECT_ID/REGION) to Gemini API (API key only); add monkey-patch translation layer for ADK's partial 3.1 Live support (see GEMINI.md for GitHub issues); re-architect protocol for raw audio/video; update client audio to Worklet; extend ADK CLI for Live models. Enables low-latency, emotionally aware speech in 200+ countries, outperforming prior setups on real-time bidirectional streaming.