Vector RAG Fails: Tree Navigation Hits 98.7% Accuracy

Semantic Similarity Mismatch Crushes Precise Retrieval

Vector databases power RAG by chunking documents, embedding chunks as vectors, and retrieving the most semantically similar ones to a query—a default since 2022 now backing a $50B industry. But similarity ≠ relevance: querying "what does Table A2.1.1 say?" pulls other tables, not the target; "what questions does Chapter 2 answer?" grabs other chapters, missing Chapter 2's Questions section. This category error—asking "what sounds similar?" instead of "where should I look?"—dooms accuracy to 30-50% on benchmarks like FinanceBench.

Proxy Pointer RAG Fixes It with Structural Trees

PageIndex discards vectors for a tree-based index: a smart table of contents mirroring document structure (pages, sections, tables). An LLM navigates this tree by deciding the next node to explore based on the query, using 'proxy pointers' to zero in on exact locations. Two engineers implemented this for $0, proving it on a World Bank PDF where standard RAG failed but tree navigation succeeded.

98.7% Accuracy Proves Trees Beat Vectors

On FinanceBench, PageIndex delivers 98.7% accuracy—nearly 2-3x standard vector RAG's 30-50%. It handles structured docs (tables, chapters) precisely by leveraging hierarchy, not fuzzy similarity, closing the five-year gap in document search without expensive vector DBs.