Modular Hybrid-Memory Agent with OpenAI Tools

Hybrid Memory Combines Vector and Keyword Search via RRF

Store facts as embedded chunks with metadata (e.g., category: 'user_pref') using OpenAI's text-embedding-3-small, normalized to unit vectors. Maintain a live BM25Okapi index on tokenized text (lowercase alphanum only). Retrieve top_k=5 by computing cosine similarities for semantics and BM25 scores for keywords, then fuse ranks with Reciprocal Rank Fusion: score = 1/(60 + vec_rank) + 1/(60 + kw_rank). This handles exact matches missed by embeddings (e.g., "order 4821" retrieves via BM25 despite low cosine) and semantic queries (e.g., "consensus algorithm" pulls Raft via vectors). Results include id, text, metadata, rrf_score, cosine, and bm25 for transparency. Dump all memories or search directly for inspection.

Trade-off: In-memory only, rebuilds BM25 on every store (fine for <1000 chunks); scales by swapping MemoryBackend impl.

Autonomous Loop with Persona-Driven Tool Dispatch

Agent owns history, memory, tools dict, and LLM (gpt-4o-mini, temp=0.2). Per user message: search memory top_k=3, inject as context into persona's system prompt (compiles traits like "Methodical", goals like "Use tools proactively", forbids "I cannot"). Loop up to 8 rounds: call LLM with tool schemas (OpenAI function spec), parse tool_calls, execute (e.g., memory_store, calculator with safe eval on math funcs, mock web_search), append tool results by id. Stops on text reply.

Tools auto-register schemas with params (e.g., memory_search: query str, top_k int). Persona ensures consistency: reason step-by-step, quote memory IDs, stay concise. Hot-swap tools at runtime (e.g., upgrade web_search KB with "lsm-tree" snippet) via register_tool—no restart needed.

Interfaces (ABC: MemoryBackend, LLMProvider, Tool) enable swaps: plug Anthropic for LLM or Pinecone for memory without agent changes.

Demos Prove Recall, Reasoning, and Persistence

Pre-seed 7 facts (e.g., "VelocityDB uses Raft", deadline March 31). Query "What consensus algorithm does VelocityDB use?" yields mem_0003 (cosine=0.847, bm25=1.23, rrf=0.03328). Agent chats recall project/deadline/Raft, finds order #4821 (32GB RAM), computes 22 days * 6.5h = 143h left (via calculator: safe eval on math lib). Stores new facts autonomously (e.g., switch to B-tree), recalls them next turn, explains B-tree fit via upgraded tool (read-optimized vs LSM write-heavy). Full dump verifies 8 chunks persisted across turns.

This modular design persists state, reasons over history+memory, acts via tools, and extends without core rewrites—ready for prod with vector DB swap.

Hybrid Memory Combines Vector and Keyword Search via RRF

Autonomous Loop with Persona-Driven Tool Dispatch

Demos Prove Recall, Reasoning, and Persistence

More from AI & LLMs

Semantic Caching Cuts AI Agent Latency 91% via Intent Matching

Modular LLM Agent: Skills, Registry, Dynamic Routing

Multi-Agent AI Pipeline for Systems Biology Analysis

Build Multimodal Qwen 3.6 Agents with Thinking & Tools