OpenRAG: Extensible Stack for Agentic RAG

Video description

There are many variables in building RAG applications, from document parsing to the language model you pick for generation and everything in between. Combining Docling for document parsing, OpenSearch for retrieval, and Langflow for orchestration, plus local and remote models, OpenRAG is an opinionated, agentic, open-source stack for building the RAG application of your dreams. Just because it has opinions doesn't make it inflexible though. In this talk we'll look at how OpenRAG gives you a great baseline for RAG and how you can tune it and evaluate the outcomes to create RAG applications that work well with your data. You'll learn how to get the best out of your documents with Docling, how OpenSearch provides more than just vector search, and how Langflow makes it easy to customise your pipeline to interact with your data the way you want to. You’ll leave with a playbook of options to improve your RAG app and a stack you can extend without reinventing everything. Phil Nash - Developer relations engineer, IBM Phil is a developer relations engineer for DataStax and Google Developer Expert living in Melbourne, Australia. He's been working in developer relations for a decade, speaking at conferences since 2012, and writing JavaScript since before jQuery. Away from the keyboard, Phil enjoys travel, live music, and hanging out with his mini sausage dog, Ruby. Socials: https://x.com/philnash https://linkedin.com/in/philnash https://philna.sh https://github.com/philnash

Why Start with OpenRAG's Opinionated Baseline

RAG remains complex due to variables like PDF parsing pains, chunking strategies, evolving embeddings, and tweaks such as summaries, chunk expansion, cross-encoders, re-ranking, and query rewriting—tailored to unique documents, users, and queries. Claims that "RAG is dead" or "solved" ignore these realities; context windows don't eliminate costs for million-token datasets, and naive pipelines (extract text, chunk, embed, vector DB, top-k retrieval) fail in production. OpenRAG provides a high-quality, extensible baseline using three open-source projects: Docling (document processing), OpenSearch (search/indexing), and Langflow (visual orchestration/agents). Run it fully offline with local models like IBM Granite 3B (LLM) or Qwen3 0.6B/6B (embeddings), supporting air-gapped setups. This stack enables agentic retrieval where the LLM decides searches/tools, outperforming rigid top-k by handling multi-step queries dynamically.

Superior Document Ingestion with Docling

Docling excels at parsing diverse formats (PDFs, HTML, Word, slides, spreadsheets, audio/video), outputting structured DocTags (XML-like hierarchy) convertible to Markdown, HTML, or JSON. Use hierarchical chunking based on document structure for better context preservation. Pipelines include:

Simple: Text extraction for Markdown/HTML/Word.
ASR: Speech-to-text for audio/video.
PDF Standard: Small models for layout analysis, table/image extraction, OCR (for scanned docs).
PDF VLM: Granite Docling 25.8M vision model for end-to-end extraction.

Toggle options like table structure capture, OCR, and image descriptions (slower but richer). Embed chunks via flexible providers (OpenAI, local), then index in OpenSearch for hybrid vector/keyword search with filtering/aggregation.

Hybrid Search and Agentic Generation in OpenSearch + Langflow

OpenSearch (Elasticsearch fork) supports multi-model vector search (useful for embedding migrations, despite slowdowns) and JVector KNN plugin for live indexing on disk (no full in-memory requirement, scales better than HNSW/IVF). Agentic retrieval in Langflow gives the LLM tools (e.g., multi-model OpenSearch retriever, calculator to avoid math hallucinations, MCP server) and instructions to perform iterative searches, yielding precise answers with tool traces and next-query nudges. UI features: upload/sync folders, inspect chunks/objects, create knowledge filters (e.g., by metadata), cloud connectors (Google Drive/SharePoint/OneDrive via OAuth for auto-sync).

Tune, Evaluate, and Extend Without Reinventing

Customize via settings (chunk size/overlap, Docling flags, system prompts, API keys for app integration) or Langflow's drag-and-drop editor: add guardrails (parse/validate inputs), Ollama models, or new flows. Expose as API/MCP server for other agents. Version 0.4.0 is playable today (Next.js frontend, Python backend); star/contribute on GitHub. Test outcomes iteratively—OpenRAG's modularity lets you baseline, measure (e.g., via Langflow enrichment), and adapt for your data/users, avoiding per-project wheel-reinvention.

Video description

Why Start with OpenRAG's Opinionated Baseline

Superior Document Ingestion with Docling

Hybrid Search and Agentic Generation in OpenSearch + Langflow

Tune, Evaluate, and Extend Without Reinventing

More on Edge

Tiny LLMs and On-Device Agents via LiteRT-LM on Edge Hardware

Gemma Chat: Offline Vibe Coding with Gemma 4 on Mac

Gemma 4: Open Models Running AI Agents On-Device

Uncensored SuperGemma-4 Powers Local Agent Workflows