Context Engineering Demystified: Agentic Search at the Core
Context engineering selects what enters an LLM's context window from diverse sources like local files, databases, web, working memory, agent skills, and long-term memory. Leonie argues it's 80% agentic search—the mechanisms deciding what and how to retrieve—over model choice. Early RAG used fixed vector search on user queries, retrieving irrelevant chunks or missing multi-hop needs. Agentic RAG introduces tools letting agents decide: retrieve? Rewrite query? Multi-round? This evolves retrieval from rigid pipelines to dynamic decisions.
Key principle: No single tool suffices. Native tools handle sources (e.g., file search for codebases, SQL/ESQL for DBs, web scrapers), but shell tools (LangChain's shell, Anthropic's bash, OpenAI's exec) add versatility via CLI commands like ls, grep, curl. Combine them: Vector for semantics, keyword for exact matches, general-purpose for complex filters. Trade-off: Shell is flexible but risky (security, errors); specialized tools are reliable but narrow.
"Context engineering is about 80% agentic search because it's this little box right here arrow from sources to window."
Building Reliable Search Tools: Descriptions and Parameters
Effective tools start with precise descriptions. Poor ones: One-sentence generics ("Search the database"). Good ones specify:
- Core purpose: What it does.
- Triggers: When to use (e.g., "For conference sessions on AI constraints"), avoid (e.g., "Not for web data").
- Relationships: Sequence (e.g., "Load ESQL skill first").
Reinforce in system prompts: "You are a search agent... decide if retrieval needed. Use tool for condition."
Parameter complexity scales failure risk:
| Complexity | Example | Agent Challenge |
|---|---|---|
| Low | get_customer(id: str) | Easy ID generation. |
| Medium | semantic_search(query: str, k: int=3, filters: dict) | Balancing params. |
| High | execute_esql(query: str) | Full query syntax. |
Always add try-except for self-correction: Return errors to agent (e.g., invalid wildcard) instead of crashing. Test tools standalone before agent integration.
"Tool description is the most important aspect... add trigger conditions, relationships."
Diagnosing and Fixing Agent Failure Modes
Agents fail in predictable ways—address systematically:
- No tool called: Relies on parametric knowledge. Fix: Prompt "Always retrieve for factual queries."
- Wrong tool: Picks web over DB. Fix: Detailed descriptions + system prompt prioritization.
- Wrong parameters: E.g., SQL
%wildcard vs.*in ESQL. Fix: Skills for docs.
Quality criteria: Tool returns relevant, non-zero results (zero may signal rewrite). Evaluate: Does output cite retrieved context? Multi-turn coherence?
Common mistake: Over-relying on semantics—fails keywords ("GPA" matches "Gemma" via tokens). Solution: Hybrid stacks.
"The most challenging aspect... was getting the agent to not call the web search tool but the database search tool."
Step-by-Step: Semantic to General-Purpose Retrieval with Skills
Assumes: Python/LangChain basics, local ElasticSearch cluster, chunked/indexed data (e.g., conference sessions: text embedded, metadata filterable).
Prerequisites: Mid-level (built basic agents); install langchain, elasticsearch, embedding model (e.g., gte-large-en-v1.5).
- Vanilla Semantic Tool (Brittle baseline):
from langchain_community.vectorstores import ElasticVectorSearch from langchain.tools import tool embeddings = HuggingFaceEmbeddings(model_name="thenlper/gte-large") vectorstore = ElasticVectorSearch(elasticsearch_url, index_name, embeddings) @tool def semantic_search(query: str) -> str: """Search conference sessions semantically.""" docs = vectorstore.similarity_search(query, k=3) return "\n".join([doc.page_content for doc in docs])
Works: "Regulatory constraints" → relevant talks. Fails: "GPA" → Gemma models (semantic drift).
Agent:create_react_agent(llm, [semantic_search], system_prompt). - General-Purpose ESQL Tool (More flexible, error-prone):
Switch to GPT-4o-mini (nano too weak for query gen).
from elasticsearch import Elasticsearch client = Elasticsearch("http://localhost:9200") @tool def execute_esql_query(esql_query: str) -> str: """Execute ESQL against conference index. Use ESQL skill first.""" try: response = client.esql.query(esql={"query": esql_query}) return json.dumps(response) except Exception as e: return f"Error: {str(e)}"
Agent generates:from conference_sessions where text like '%GPA%'→ Error (wrong wildcard). Self-corrects next turn. - Agent Skills for Progressive Disclosure:
- Markdown file:
skills/esql.md--- name: ESQL Skill description: Generate ESQL queries. --- Structure: from index | where condition | limit k Strings: double quotes. Wildcards: * not %. - Tools:
create_openai_functions_agent+ skill loader middleware. - Updated prompt/tool: "Load ESQL skill before execute_esql_query."
Result: Agent loads skill →
from conference_sessions where text like '*GPA*' | limit 3→ Exact match (Samuel's talk).
- Markdown file:
Fits in workflow: Post-indexing, pre-agent loop. Practice: Index your data, break semantic tool, iterate fixes.
"Doing good search is incredibly difficult... curate your own stack."
Shell Tools: Filesystem and Beyond
Shell unlocks local files: ls, grep, cat. LangChain @tool wraps subprocess.
Limitations: No built-in semantics; security (sandbox). Extend: Custom CLIs (e.g., DB CLI, curl for web).
Teaser: File search beats naive recursion for code agents; combine with skills for CLI docs.
Key Takeaways
- Prioritize agentic search: 80% of context engineering success.
- Craft tool descriptions with purpose, triggers, relationships; reinforce in prompts.
- Add try-except everywhere—let agents self-correct from errors.
- Use skills for complex params (e.g., query langs); progressive disclosure scales docs.
- Hybrid tools: Semantic for concepts, keyword/ESQL for exact, shell for files/web.
- Test standalone: Break with keywords/filters before agent.
- Model matters: Nano for simple, mini+ for query gen.
- Zero results? Rewrite, don't answer.
- Sequence tools: Skills → General query → Shell fallback.
- Build stacks, not silos: Match source to tool.