ADK Memory Bank: Long-Term Multimodal AI Agent Memory
Implement persistent, semantic-searchable memory for AI agents using Google Cloud's ADK Memory Bank to handle text, images, audio, and video across sessions, enabling personalized responses via automatic fact extraction and retrieval.
Distinguish SessionService for Short-Term Chats from MemoryService for Long-Term Archives
SessionService handles active conversations, allowing you to resume live chats with short-term state that doesn't persist across restarts. MemoryService acts as a long-term filing cabinet, archiving facts from multiple sessions and media types (text, images, audio, video) for retrieval later. For quick tests, use the simple in-memory MemoryService with keyword search, but it resets on restarts. Switch to Vertex AI Memory Bank for production: it stores in the cloud, uses Gemini to extract facts, generates embeddings for semantic search (e.g., "two-wheeled vehicle" matches "bicycle"), and organizes by topics like user preferences or travel experiences. This setup processes content beyond simple storage—extracts useful facts and makes them searchable by meaning.
Set Up Memory Bank with Dual Models for Fact Extraction and Embedding
Configure an Agent Engine to power the Memory Bank by selecting two models: one (e.g., Gemini) extracts key facts from conversations or media, the other embeds them for semantic similarity. Define topics to categorize memories, creating a backend that turns raw inputs into a queryable knowledge base. Avoid treating it as a mere database—it's a service that intelligently processes and indexes multimodal data, ensuring agents recall details like "historical building from photo" or "enjoys seaside from video."
Ingest Sessions or Media Directly, Retrieve via PreloadMemoryTool
Save memories two ways: (1) At session end, call addSessionToMemory to archive full chats—including user messages, agent replies, image/video/audio references—extracting and storing facts automatically. (2) Upload directly via code, preloading from files with context (e.g., send image + text description) to generate facts without a chat. For retrieval, add PreloadMemoryTool to the agent: it runs at every turn's start, semantically searches the bank based on the new user message, injects top relevant facts (e.g., "user likes historical architecture, enjoys seaside, visited town") into the prompt. No custom agent logic needed—the tool enriches context automatically, enabling responses like personalized cultural destination suggestions from prior multimodal shares.
Achieve Consistent, Personalized Agents Across Sessions
Combine three memory layers: Session/State for live chats, persistent sessions/user profiles for restarts, and Memory Bank for cross-session recall. Demo proves it: Session A ingests photo (historical building), video (sea), audio (town); after restart, Session B query "suggest cultural destination based on prior shares" triggers semantic retrieval, yielding tailored recommendations. Follow the ADK codelab to replicate, building agents that stay context-aware over days/weeks for customer service, assistants, or automation.