Reducing MCP Tool Context Overhead with Hermes Agent Tool Search

Solving the MCP Context Tax

Integrating multiple Model Context Protocol (MCP) servers into an AI agent often leads to significant context window bloat. Because every tool schema is sent to the model on every turn, deployments with multiple servers can consume 15,000–60,000 tokens just for tool definitions. This overhead not only wastes context space but also causes "decision paralysis" in models, where the abundance of irrelevant tool options degrades performance.

Progressive Tool Disclosure

Hermes Agent’s new Tool Search feature mitigates this by replacing the full tool array with a three-part bridge system: tool_search to query the catalog, tool_describe to fetch specific schemas on demand, and tool_call to execute the function. This approach ensures the model only sees the schemas it actually requires for a specific turn.

Under the hood, the system uses the BM25 retrieval algorithm to match queries against tool metadata, with a literal substring fallback to prevent zero-IDF issues. The catalog is rebuilt statelessly on every turn to ensure the agent registry remains synchronized with the live environment.

Performance and Configuration

This feature is not just a token-saving mechanism; Anthropic evaluations show it yields a 49% to 74% accuracy gain on Opus 4 by reducing decision noise. By default, the system operates in auto mode, which only triggers when deferrable tool schemas exceed 10% of the active context window. Developers can tune this behavior via hermes.yaml by adjusting the threshold_pct, search_default_limit, and max_search_limit parameters.

Solving the MCP Context Tax

Progressive Tool Disclosure

Performance and Configuration

More from AI & LLMs

Agentic Abstention: Improving When LLM Agents Should Stop

How the Model Context Protocol (MCP) Standardizes AI Integration

Architecting Long-Running AI Agents for Multi-Day Workflows

Decoupling Search from Reasoning in LLM Agents