Context Engines: Fix Agent Context to Cut Tokens 50%
Agents fail without org-specific context; build a reasoning layer that personalizes retrieval, resolves conflicts, and respects permissions to deliver task-focused info, reducing task time from 2.5hrs/21M tokens to 25min/10M.
Agents Need Org Context to Avoid Doom Loops
Peter Werry from Unblocked explains that AI agents start at "ground zero" with no knowledge of your codebase, conventions, or decisions. Without targeted context, they rip through repos inefficiently, leading to wrong code, long reviews, and "doom loops"—endless iterations fixing misguided outputs. Humans used to be the context engine, manually feeding tickets and correcting agents, but scaling to background agents demands automation.
The goal mirrors human onboarding: accumulate "battle scars" from incidents, mentors, and history to know "why" things are built a certain way. Werry references an adoption curve (lifted from Vimath) showing progression from autocomplete (2022, 8k tokens) to parallel agents with MCP servers, toward cloud-based YOLO agents. Humans become the bottleneck in context-switching; context engines enable agentic freedom by supplying "all the context you need and most importantly none you don't" in optimized form.
"Context engineering is kind of the art of supplying uh all the context that you need and most importantly none of the context that you don't need in a highly optimized way so that when the agent starts to run it executes the task uh in a streamlined way that's in line with your organization's best practices." (Peter Werry defining context engines—core to avoiding waste.)
A demo contrasts MCP-only vs. context engine: without it, an agent missed legacy Anthropic token budget deps, deleting code; with it, the agent preserved backwards compatibility after reasoning over history.
Myths: RAG, MCP, and Big Windows Fall Short
Naive RAG over docs causes "satisfaction of search"—agents grab the first plausible match (e.g., from Notion) and stop, missing golden nuggets in Slack incidents or deleted PRs. In large orgs, it pulls irrelevant code, creates conflicts, wastes tokens on compaction, and ignores tasks/personalization.
Connecting MCP servers gives access but no relationships or "why." Bigger windows (e.g., Gemini's 1M tokens) excel at needle-in-haystack but fail reasoning across sources or truth selection. Most orgs exceed 1M tokens anyway; even 50M won't resolve conflicts or focus retrieval.
"Access doesn't equal understanding." (Werry on why MCP/RAG alone leads to doom loops—emphasizes need for reasoning layer.)
The iceberg below compiling code: user intent, past rejections, failures, deletions. Agents need history to infer absences.
Building Blocks: Graphs, Resolution, Personalization
Inputs: planning tools, docs, Slack/Teams, code, PRs. Outputs: agents, CLI, code review in SCM, messaging.
Key requirements:
- Unified relationships via social engineering graph: Link data deeply—e.g., distill repeated PR comments into "memories" of best practices. Easy: PR-Slack links. Hard: infer decisions from patterns/incidents. (Workshop builds this graph.)
- Conflict resolution: Reject naive recency (docs/chats lag code). Evolved to: main branch as truth + expert convos (forward-looking) + history (what not to repeat). Surface unresolvable conflicts for human input.
- Permissions: Flow access controls—e.g., Slack private channels only for authorized users; answers stay private.
- Targeted retrieval: Bias to user's repos (via PR counts): deep vector search there, shallow elsewhere. Task-focused to save tokens.
High-level flow: Data sources → graph/reasoning → personalized context → agents.
"Satisfaction of search is a term that actually comes out of uh the medical field in radiology... they might find something... and then they stop." (Werry borrowing radiology concept—explains why agents halt prematurely, missing key org history.)
Production Lessons: What Unblocked Got Wrong
Initially optimized for access: knowledge graph + retrieval tools. Failed—agents couldn't traverse meaningfully.
Hid conflicts by forcing naive resolution (recency/code bias). Better: surface them to learn from feedback.
Don't cache answers—code/docs change constantly; prior answers regress to mean, polluting context.
Experiment: Larger task without engine repeated past failures (missed implementation details); with it, nailed it. Time: 2.5 hours → 25 minutes. Tokens: 21M → 10M. (Claude-estimated; vibe is dramatic efficiency.)
Use in planning (biggest wins via MCP skill) and review (understands motivations beyond code).
"Initially we optimized for access not understanding... that does not work." (Werry on first failure—shift to reasoning was key pivot.)
Key Takeaways
- Build social graphs to link code/PRs/Slack into memories of decisions/best practices, not just surface links.
- Resolve conflicts with multi-signal scoring (recency + code truth + expert forward-look); surface irresolvables.
- Personalize via user PR history: deep retrieval on their repos, shallow elsewhere.
- Enforce permissions end-to-end—e.g., private Slack only for access holders.
- Integrate early in agent flows (planning/review) via MCP skills for 50%+ token/time savings.
- Avoid caching answers or feeding prior outputs—dynamic orgs demand fresh retrieval.
- Target 'why' via history/incidents, beating RAG's 'what' for production code.
- Prototype with graphs before full engine; workshop-style builds reveal gaps fast.