RAG-Anything + LightRAG Handles Images/Charts in PDFs

RAG-Anything extends LightRAG to process scanned PDFs, charts, and images via local MinerU parsing, splitting into text/images, extracting entities/relationships/embeddings with GPT-4o-mini, and merging into a unified vector DB + knowledge graph for querying.

Local Parsing Extracts Components from Non-Text Docs

RAG-Anything solves the limitation of text-only RAG systems like LightRAG by handling scanned PDFs, images, charts, and graphs. It uses MinerU, an open-source local tool, to parse documents into components: headers, text blocks, charts, images, and LaTeX equations. MinerU identifies these without understanding content—it draws bounding boxes around elements.

Specialized local models then process components:

  • PaddleOCR extracts readable text from scanned blocks (e.g., "Company X reported strong Q3'23 results with revenue growth").
  • Charts and equations convert to text where possible.
  • Pure images (e.g., bar graphs) become screenshots.

This splits output into two buckets—text and images—avoiding full-document OCR. Local processing on CPU (or GPU with PyTorch tweaks) keeps it free and fast, reducing LLM costs compared to screenshot-everything approaches.

Dual-Path LLM Processing Builds Embeddings and Knowledge Graphs

Text and image buckets feed into an LLM like GPT-4o-mini (or local Ollama) via separate prompts:

  • Text path: Prompt extracts entities, relationships (for knowledge graph), and embeddings (for vector DB).
  • Image path: LLM analyzes screenshots to extract the same—entities/relationships/embeddings.

From one document, this creates four artifacts: text embeddings, text KG, image embeddings, image KG. RAG-Anything merges them by overlaying entities into single vector DB and KG. This preserves context across modalities, enabling queries like "monthly revenue trend for Novatech Inc. Jan-Sep 2025" to pull bar chart data (e.g., Jan: $4.6M, Feb: $4.9M, etc.).

Merging saves money/time: Local scalpel parsing minimizes LLM tokens vs. treating entire docs as images.

Integrate with LightRAG and Use via Claude Code Skills

RAG-Anything wraps LightRAG: Ingest text docs via LightRAG UI/API; non-text via RAG-Anything script. Post-processing merges RAG-Anything's DB/KG with LightRAG's into one unified system. Query unchanged—via LightRAG UI, API, or Claude Code natural language (e.g., it auto-calls query API).

Setup (one-shot Claude Code prompt in LightRAG dir):

  1. Updates storage path for existing Docker.
  2. Sets models: GPT-4o-mini (or nano), text-embedding-3-large (OpenAI).
  3. Fixes repo bugs like embedding double-wrap. Downloads MinerU/dependencies (heavier than LightRAG; CPU default, GPU optional).

Ingest non-text: Claude Code skill runs script—"use rag-anything skill to upload these docs/folder." Auto-restarts Docker, processes via MinerU → LLM → merge. Text uploads stay via UI/skill.

Trade-offs: Script-only for non-text (no UI); CPU slow for large batches (GPU fix via Claude Code); minor OpenAI costs for LLM extraction. Result: Production RAG for real docs, cheaper than cloud alternatives.

Video description
⚡Master Claude Code, Build Your Agency, Land Your First Client⚡ https://www.skool.com/chase-ai 🔥FREE community🔥 https://www.skool.com/chase-ai-community/classroom/4fe79bd0?md=fc9896c946704869a1b2f4064454a558 💻 Need custom work? Book a consult 💻 https://chaseai.io Lets unlock multi modal RAG with RAG-Anything. In this video, we build on our lightRAG base from yesterday, giving it the power to handle non text documents with the RAG Anything integration. ⏰TIMESTAMPS: 0:00 - Intro 0:48 - RAG Anything 3:22 - How it Works 13:11 - Install & Demo 18:19 - Final Thoughts RESOURCES FROM THIS VIDEO: ➡️ Master Claude Code: https://www.skool.com/chase-ai ➡️ My Website: https://www.chaseai.io ➡️ LightRAG GH: https://github.com/hkuds/lightrag ➡️ RAG-Anything GH: https://github.com/HKUDS/RAG-Anything ➡️ MinerU: https://github.com/opendatalab/MinerU #claudecode #lightrag #raganything

Summarized by x-ai/grok-4.1-fast via openrouter

7653 input / 1555 output tokens in 13954ms

© 2026 Edge