Why Start with OpenRAG's Opinionated Baseline
RAG remains complex due to variables like PDF parsing pains, chunking strategies, evolving embeddings, and tweaks such as summaries, chunk expansion, cross-encoders, re-ranking, and query rewriting—tailored to unique documents, users, and queries. Claims that "RAG is dead" or "solved" ignore these realities; context windows don't eliminate costs for million-token datasets, and naive pipelines (extract text, chunk, embed, vector DB, top-k retrieval) fail in production. OpenRAG provides a high-quality, extensible baseline using three open-source projects: Docling (document processing), OpenSearch (search/indexing), and Langflow (visual orchestration/agents). Run it fully offline with local models like IBM Granite 3B (LLM) or Qwen3 0.6B/6B (embeddings), supporting air-gapped setups. This stack enables agentic retrieval where the LLM decides searches/tools, outperforming rigid top-k by handling multi-step queries dynamically.
Superior Document Ingestion with Docling
Docling excels at parsing diverse formats (PDFs, HTML, Word, slides, spreadsheets, audio/video), outputting structured DocTags (XML-like hierarchy) convertible to Markdown, HTML, or JSON. Use hierarchical chunking based on document structure for better context preservation. Pipelines include:
- Simple: Text extraction for Markdown/HTML/Word.
- ASR: Speech-to-text for audio/video.
- PDF Standard: Small models for layout analysis, table/image extraction, OCR (for scanned docs).
- PDF VLM: Granite Docling 25.8M vision model for end-to-end extraction.
Toggle options like table structure capture, OCR, and image descriptions (slower but richer). Embed chunks via flexible providers (OpenAI, local), then index in OpenSearch for hybrid vector/keyword search with filtering/aggregation.
Hybrid Search and Agentic Generation in OpenSearch + Langflow
OpenSearch (Elasticsearch fork) supports multi-model vector search (useful for embedding migrations, despite slowdowns) and JVector KNN plugin for live indexing on disk (no full in-memory requirement, scales better than HNSW/IVF). Agentic retrieval in Langflow gives the LLM tools (e.g., multi-model OpenSearch retriever, calculator to avoid math hallucinations, MCP server) and instructions to perform iterative searches, yielding precise answers with tool traces and next-query nudges. UI features: upload/sync folders, inspect chunks/objects, create knowledge filters (e.g., by metadata), cloud connectors (Google Drive/SharePoint/OneDrive via OAuth for auto-sync).
Tune, Evaluate, and Extend Without Reinventing
Customize via settings (chunk size/overlap, Docling flags, system prompts, API keys for app integration) or Langflow's drag-and-drop editor: add guardrails (parse/validate inputs), Ollama models, or new flows. Expose as API/MCP server for other agents. Version 0.4.0 is playable today (Next.js frontend, Python backend); star/contribute on GitHub. Test outcomes iteratively—OpenRAG's modularity lets you baseline, measure (e.g., via Langflow enrichment), and adapt for your data/users, avoiding per-project wheel-reinvention.