PageIndex: Vectorless RAG via LLM Tree Reasoning

PageIndex builds hierarchical document trees with section summaries, enabling LLMs to reason over structure for precise retrieval without embeddings—boosting accuracy on complex docs like FinanceBench.

Hierarchical Tree Beats Vector Similarity for Complex Docs

Traditional RAG fails on long documents like research papers or financial reports because vector similarity misses cross-section relevance, which demands structure-aware reasoning. PageIndex fixes this by parsing PDFs into a table-of-contents tree: each node holds a title, LLM-generated summary, page index, and full text. Retrieval strips full text, feeds tree JSON (titles + summaries) to an LLM like GPT-4o, which outputs step-by-step reasoning and relevant node IDs. This grounds retrieval in document hierarchy, not proxy embeddings, yielding interpretable paths (e.g., "self-attention motivation spans Background and Model sections due to recurrence limits mentioned there").

For the Transformer paper ("Attention Is All You Need"), PageIndex infers nodes like "1 Introduction", "3.1 Self-Attention", preserving author intent. Query "Why self-attention over recurrence, complexity trade-offs?" triggers LLM reasoning: it flags Background (recurrence flaws), Model (self-attention intro), and Experiments (O(n²) vs. O(n) comparisons), avoiding irrelevant chunks.

Three-Step Pipeline: Index Once, Query Reusably

  1. Index: Submit PDF to PageIndex API; it auto-builds tree (poll is_retrieval_ready). Python: pi_client.submit_document(pdf_path)get_tree(doc_id, node_summary=True).
  2. Retrieve: Prompt LLM with tree JSON and query for {"thinking": "...", "node_list": ["node_id_1", ...]}. Strip text first via utils.remove_fields(tree, ["text"]). Cost: single LLM call per query, no embeddings.
  3. Generate: Fetch full text from nodes, label as "Section: Title\nText", prompt for structured answer (e.g., motivations, numbers, caveats). Reuses tree zero-cost—second query "multi-head attention, scaling role?" hits only "3.1 Self-Attention", explaining 8 heads, √d_k scaling to curb softmax variance.

Full code uses pageindex, openai libs; async LLM calls with temp=0 for determinism. Tree ready in ~minutes; queries instant post-index.

Proven Gains: Precision on Benchmarks, No Vector Overhead

PageIndex excels where vectors falter—multi-hop queries across sections in professional docs. FinanceBench shows superior retrieval accuracy via traceable reasoning, not black-box cosine scores. Trade-offs: LLM calls add latency (mitigate with caching), but tree determinism cuts re-indexing. Ideal for precision domains; scales to reuse across queries without chunking hassles. GitHub notebook demos full flow on arXiv Transformer PDF.

Summarized by x-ai/grok-4.1-fast via openrouter

9236 input / 1762 output tokens in 17217ms

© 2026 Edge