RAG-Anything + LightRAG Handles Images/Charts in PDFs

Local Parsing Extracts Components from Non-Text Docs

RAG-Anything solves the limitation of text-only RAG systems like LightRAG by handling scanned PDFs, images, charts, and graphs. It uses MinerU, an open-source local tool, to parse documents into components: headers, text blocks, charts, images, and LaTeX equations. MinerU identifies these without understanding content—it draws bounding boxes around elements.

Specialized local models then process components:

PaddleOCR extracts readable text from scanned blocks (e.g., "Company X reported strong Q3'23 results with revenue growth").
Charts and equations convert to text where possible.
Pure images (e.g., bar graphs) become screenshots.

This splits output into two buckets—text and images—avoiding full-document OCR. Local processing on CPU (or GPU with PyTorch tweaks) keeps it free and fast, reducing LLM costs compared to screenshot-everything approaches.

Dual-Path LLM Processing Builds Embeddings and Knowledge Graphs

Text and image buckets feed into an LLM like GPT-4o-mini (or local Ollama) via separate prompts:

Text path: Prompt extracts entities, relationships (for knowledge graph), and embeddings (for vector DB).
Image path: LLM analyzes screenshots to extract the same—entities/relationships/embeddings.

From one document, this creates four artifacts: text embeddings, text KG, image embeddings, image KG. RAG-Anything merges them by overlaying entities into single vector DB and KG. This preserves context across modalities, enabling queries like "monthly revenue trend for Novatech Inc. Jan-Sep 2025" to pull bar chart data (e.g., Jan: $4.6M, Feb: $4.9M, etc.).

Merging saves money/time: Local scalpel parsing minimizes LLM tokens vs. treating entire docs as images.

Integrate with LightRAG and Use via Claude Code Skills

RAG-Anything wraps LightRAG: Ingest text docs via LightRAG UI/API; non-text via RAG-Anything script. Post-processing merges RAG-Anything's DB/KG with LightRAG's into one unified system. Query unchanged—via LightRAG UI, API, or Claude Code natural language (e.g., it auto-calls query API).

Setup (one-shot Claude Code prompt in LightRAG dir):

Updates storage path for existing Docker.
Sets models: GPT-4o-mini (or nano), text-embedding-3-large (OpenAI).
Fixes repo bugs like embedding double-wrap. Downloads MinerU/dependencies (heavier than LightRAG; CPU default, GPU optional).

Ingest non-text: Claude Code skill runs script—"use rag-anything skill to upload these docs/folder." Auto-restarts Docker, processes via MinerU → LLM → merge. Text uploads stay via UI/skill.

Trade-offs: Script-only for non-text (no UI); CPU slow for large batches (GPU fix via Claude Code); minor OpenAI costs for LLM extraction. Result: Production RAG for real docs, cheaper than cloud alternatives.