Build Knowledge Bases from Agent Failures

Why Enterprise AI Fails Despite Hype

AI excels at reasoning, code generation, and benchmarks, but stalls on Jira tickets and business delivery because it lacks institutional knowledge—tribal, undocumented domain specifics. Enterprises dump Confluence, Jira, GitHub into RAG or MCP servers (10-20 per team), assuming retrieval fixes it. Reality: 40% tribal knowledge, 20% outdated/unreliable/duplicated. Output is undeterministic, untested; no evals mean you're data-entering gaps yourself. McKinsey stat: 88% companies use AI, only 6% see value. Push strategies (build everything upfront) exhaust you; agents become knowledge consumers, not managers.

Relate to Memento: Agent has 15-minute memory, tattoos notes to persist. Same for AI—great at computation, terrible at retaining enterprise context without structured help.

Quote: "Why do my Jira tickets not moving? Because that defines the business delivery and ROI."

Demand-Driven Context: Pull from Failures Like TDD

Flip to pull strategy: Treat agents like new hires. Give problems (Jira/incidents), let them fail, surface gaps, fill as domain expert, have agent curate into reusable Markdown blocks. Analogies:

Monolith → Microservices: Break knowledge into agent-usable chunks.
Waterfall → Agile: Iterative cycles.
New hire onboarding: Assign tasks, they ask/fill docs.
TDD: Write failing tests (problems), implement to pass (fill gaps), refactor (curate).

Preprint proves it: arXiv paper on demand-driven context shows accuracy jumps with cycles.

Core Cycle (Repeat per Problem):

Assign Problem: Real ticket/incident (e.g., root cause analysis). Agent retrieves from monolith (Confluence/Slack/GitHub sim via files/MCP).
Agent Fails & Demands: Outputs checklist of missing info (terminologies, business logic). Scores confidence (1-5), flags critical gaps. Discovers undocumented entities.
Human Fills: Provide exact knowledge (prepped answers in demo).
Agent Curates: Updates knowledge base—discovers related entities, structures in Markdown (entities grow from 56 to 70+ per cycle). Persistence: Files for demo, but plug MCP/Confluence.

After 14 incidents: Confidence from 1.4 (all critical) to 4.4. Agent evolves from consumer to curator.

Principles: Agents must reason post-retrieval (missing in basic RAG). Gaps surface only via problems—can't guess tribal knowledge. No one fixes your monolith; LLM providers chase models, retrieval market ($9B) ignores it.

Common Mistakes Avoided:

No evals: Check value, not just output.
Overbuild RAGs on junk data.
Manual exhaustion: Automate later.

Quote: "Unless we break down that monolith knowledge base into context blocks useful for agents, it won't work."

Hands-On Implementation: Agentic Framework

Uses any agent framework (Claude demo for crowd, Copilot at IKEA). Components:

Skills/Rules/Agents/Hooks: Custom for failure analysis, gap checklist, curation.
Knowledge Base: Markdown files (entities as sections). Live updates visible.
Retrieval First: Fetches monolith, then gap-finds if low confidence.

Demo Flow (Terminal/Claude): Incident: "Root cause high latency in service X." Agent scans monolith, lists 6 missing entities (e.g., notification logic). Provide info → Agent adds 5-6 more discovered entities. Re-run: Succeeds.

Scale manually painful after 1-2 cycles. Quality criteria: Confidence >4, zero critical gaps, full task completion.

Quote: "One problem surfaced six entities never documented... it discovers gaps, stores new info."

Automate at Scale: Batch Historical Tickets

Grab archived Jira/incidents (JSON/MD with descriptions/comments). Platform Ops Agent validates batch (20 incidents) against knowledge base:

Per ticket: Retrieve, score docs (trustworthy/outdated/missing).
Aggregate: Surfaces enterprise-wide gaps.

Run cycles on history → Builds base before live use. Agents handle knowledge management autonomously.

Trade-offs: Initial failures expected (design goal). Requires domain experts briefly. Works on cloud/local. At IKEA (100+ eng, 6 product teams): Powers delivery services.

Exercise: Pick 5 real tickets. Run cycle in your agent (Claude/Copilot). Track entity growth, confidence. Prerequisites: Agent familiarity, engineering background, Markdown for docs.

Quote: "We move agent from consumer to knowledge manager... the whole knowledge management is your job."

Key Takeaways

Start with real problems, not curated context—failures pinpoint exact gaps.
One cycle: Problem → Retrieve → Gap checklist → Fill → Curate Markdown blocks.
Repeat 10-15x on incidents: Confidence scales 1.4 → 4.4+; base self-improves.
Automate batch validation on historical tickets for enterprise scale.
Eval by task completion/ROI (Jira velocity), not output existence.
Tools: Claude/Copilot + hooks; persist via MCP/Confluence.
Avoid: RAG spam on tribal knowledge—demand-driven wins.
Fits broader workflow: Pre-agentic cleanup for production autonomy.