Slash AI Agent Tokens 98% with MCP Optimizations
Code execution treats MCP servers as file systems, loading only needed tool files (150K to 2K tokens, 98% cut), while tool search dynamically discovers thousands of tools, reducing upfront load by 85%.
Progressive Disclosure Cuts Upfront Token Load
Code execution replaces full tool definitions by mounting MCP servers as file systems in a sandbox. Agents explore folders (one per server like Google Drive or Salesforce) and read only relevant TypeScript files for specific tools, achieving progressive disclosure. Anthropic's example moves a Drive doc to Salesforce using 150,000 tokens with direct calls but just 2,000 with code execution—a 98% reduction. Benefits include filtering data in code (loops, conditionals stay out of context), keeping sensitive info (emails, phones) isolated, and avoiding model roundtrips. Requires sandbox with isolation and limits, but Cloudflare's similar "code mode" validates the pattern.
Tool search complements this: add Anthropic's search tool (regex or BM25 ranking) to your list, set default_loading: true on non-essential tools. Agents query a catalog like Claude's file search, handling thousands dynamically. Cuts 55,000-token multi-server overhead by 85%; accuracy drops past 30-50 tools without it.
Dynamic context loading adds three levels: (1) list available servers, (2) tool summaries per server on relevance, (3) full schema only for chosen tools. Pairs with Bright Data's skills (YAML + Markdown in skill.md folders, 5 pre-built across 40+ agents via Open Agent Skill Ecosystem).
Server-Side Scoping Minimizes Loaded Tools
Group tools by domain (e.g., e-commerce, finance) and load only needed ones via Bright Data's MCP server (60+ tools, 11 groups, open-source MIT on GitHub). Specify via URL groups param or env var—combine multiples for sessions. For production, lock to exact tools (e.g., 4/60) with tools env var after discovery, maximizing savings but requiring prior tool knowledge.
Layered MCP architecture uses sub-agents: discovery/planning/execution layers insulate the main agent's context. Main agent sends inputs, gets results—scales for many servers or team-owned tools.
Output Optimizations Trim Response Tokens
Strip Markdown/formatting from web/doc results before context (saves per response); parse Google results to top organics only, dropping ads/related.
Programmatic tool calling lets Claude write Python to invoke tools (mark allowed_callers: ["code_execution"]); intermediates skip context, only final output enters. Boosts benchmarks like BrowseComp/DeepSearchQA; MCP tools unsupported yet.
TOON (Token Oriented Object Notation) declares fields once, streams CSV-like rows—30-60% savings vs. JSON for flat lists (e.g., products: IDs/names/prices). Fails on nested data like profiles.
Stack for max impact: groups at connection, search for outliers, programmatic for multi-step, stripping/TOON on outputs. Code execution for full replacement. Bright Data offers 5K free monthly requests.