Auto-Update Memory with Callbacks
Callbacks intercept the agent's lifecycle—before/after agent runs, model calls, or tool executions—to inject custom logic that updates memory without complicating the agent's core instructions. This creates a 'spy' that tracks conversation details in real-time, like categorizing activities in a trip planner (cultural for museums, food for restaurants, outdoor for parks).
Implementation: Use an after_tool callback to detect the tool called (e.g., museum tool sets activity_type: 'cultural'), then write it to tool_context. The planner's instructions read last_activities from context and ban repeats, e.g., "If last was cultural (museum), suggest food or outdoor instead." This keeps plans diverse without the agent managing state itself, reducing token usage and complexity for dynamic nudges like avoiding duplicate attractions.
Structured Data via Custom Tools
Shift from unstructured chat logs to structured storage like databases for key facts (e.g., dietary preferences), enabling proactive recall across sessions. Agents call tools to read/write profiles, making interactions feel personalized without re-asking known info.
Build two tools: save_user_preference (input: dict like {'diet': 'vegan'}; saves as DB record) and recall_user_preferences (reads and returns DB record). Instruct the agent: "Always call recall_user_preferences first; call save_user_preference when user states a preference."
User flow: First dinner request prompts preference question; user says "vegan," tool saves it. Restart script, request dinner again—agent recalls vegan status, skips question, suggests vegan restaurant directly. This scales for profiles (allergies, budget) and persists beyond single sessions, outperforming text-only memory for queryable facts.
Multimodal Memory for Non-Text Inputs
Extend memory beyond text to photos, videos, and audio by storing media in a 'memory bank' and providing a preload_memory tool for agents to load relevant files into conversations. This lets agents infer preferences from 'vibes' like a beach photo implying relaxation travel.
Code stores session media (picture/video/audio) directly in the bank. Agent uses preload_memory to fetch and analyze, e.g., after uploads, query "Based on the picture, video, and audio I shared, where should I travel?"—agent connects dots across modalities for tailored suggestions like beach destinations. Trade-off: Increases storage costs but enables richer context than text summaries, mimicking human multi-sensory recall for travel, shopping, or creative apps.
These patterns build on basics (session/multi-agent/persistent state) for production agents that impress by remembering conversations, profiles, and media—try via linked codelab/repo using Agent Development Kit.