OpenRAG: Extensible Stack for Agentic RAG

Why Start with OpenRAG's Opinionated Baseline

RAG remains complex due to variables like PDF parsing pains, chunking strategies, evolving embeddings, and tweaks such as summaries, chunk expansion, cross-encoders, re-ranking, and query rewriting—tailored to unique documents, users, and queries. Claims that "RAG is dead" or "solved" ignore these realities; context windows don't eliminate costs for million-token datasets, and naive pipelines (extract text, chunk, embed, vector DB, top-k retrieval) fail in production. OpenRAG provides a high-quality, extensible baseline using three open-source projects: Docling (document processing), OpenSearch (search/indexing), and Langflow (visual orchestration/agents). Run it fully offline with local models like IBM Granite 3B (LLM) or Qwen3 0.6B/6B (embeddings), supporting air-gapped setups. This stack enables agentic retrieval where the LLM decides searches/tools, outperforming rigid top-k by handling multi-step queries dynamically.

Superior Document Ingestion with Docling

Docling excels at parsing diverse formats (PDFs, HTML, Word, slides, spreadsheets, audio/video), outputting structured DocTags (XML-like hierarchy) convertible to Markdown, HTML, or JSON. Use hierarchical chunking based on document structure for better context preservation. Pipelines include:

Simple: Text extraction for Markdown/HTML/Word.
ASR: Speech-to-text for audio/video.
PDF Standard: Small models for layout analysis, table/image extraction, OCR (for scanned docs).
PDF VLM: Granite Docling 25.8M vision model for end-to-end extraction.

Toggle options like table structure capture, OCR, and image descriptions (slower but richer). Embed chunks via flexible providers (OpenAI, local), then index in OpenSearch for hybrid vector/keyword search with filtering/aggregation.

Hybrid Search and Agentic Generation in OpenSearch + Langflow

OpenSearch (Elasticsearch fork) supports multi-model vector search (useful for embedding migrations, despite slowdowns) and JVector KNN plugin for live indexing on disk (no full in-memory requirement, scales better than HNSW/IVF). Agentic retrieval in Langflow gives the LLM tools (e.g., multi-model OpenSearch retriever, calculator to avoid math hallucinations, MCP server) and instructions to perform iterative searches, yielding precise answers with tool traces and next-query nudges. UI features: upload/sync folders, inspect chunks/objects, create knowledge filters (e.g., by metadata), cloud connectors (Google Drive/SharePoint/OneDrive via OAuth for auto-sync).

Tune, Evaluate, and Extend Without Reinventing

Customize via settings (chunk size/overlap, Docling flags, system prompts, API keys for app integration) or Langflow's drag-and-drop editor: add guardrails (parse/validate inputs), Ollama models, or new flows. Expose as API/MCP server for other agents. Version 0.4.0 is playable today (Next.js frontend, Python backend); star/contribute on GitHub. Test outcomes iteratively—OpenRAG's modularity lets you baseline, measure (e.g., via Langflow enrichment), and adapt for your data/users, avoiding per-project wheel-reinvention.