Local AI Agent Stack: Ollama as LLM, MCP as Libraries

Agentic Systems as Programmable Stacks

Map traditional programming to LLM agents: the LLM (via Ollama) acts as the language runtime, MCP servers function as swappable libraries for capabilities, and Markdown-defined skills serve as the executable programs. This analogy makes every layer visible and replaceable, enabling full control without vendor lock-in. Run the entire stack on a single laptop using no cloud LLMs or paid services, wired together by a minimal Python orchestrator and one JSON config file.

Ollama provides the local LLM runtime for reasoning and decision-making. MCP servers deliver modular tools (like data access or APIs) that the LLM calls into, mimicking library imports. Skills, written in Markdown, define specific agent behaviors as self-contained programs the LLM interprets and executes.

Wiring and Execution Flow

The Python orchestrator handles coordination: it loads the JSON config to initialize Ollama, MCP servers, and skills, then routes LLM outputs to invoke the right MCP libraries or skills. This setup supports iterative reasoning loops where the LLM decides tool use, executes via MCP/skills, and refines based on results—all locally.

Trade-off: Local execution prioritizes privacy and cost-zero runs but limits to hardware-constrained models; scale by swapping Ollama models or adding MCPs without rewriting core logic.

Production-Ready Ops Example

Query: "The on-call engineer is in country X. Is today a public holiday there, and if so, which of their open P1 issues need backup coverage?"

The agent combines local data sources (via MCPs) like holiday calendars, engineer locations, and issue trackers. LLM reasons over inputs, calls MCP libraries for data retrieval, applies Markdown skills for analysis (e.g., filtering P1 issues), and outputs actionable coverage recommendations. This handles real on-call shifts, demonstrating agentic reliability for ops without external dependencies.