Slash 98% MCP Tokens via Code Execution & 9 More Tricks

Code execution treats MCP servers as file systems, loading only needed tool files (150K to 2K tokens, 98% cut). Stack with tool search (85% off 55K baseline), scoped groups, and output stripping for cheapest agents.

Progressive Disclosure Crushes Input Token Waste

MCP servers waste half your context on tool definitions—up to 150K tokens before any agent action. Code execution fixes this by turning servers into explorable file systems: tools become TypeScript files in folders (e.g., Google Drive and Salesforce). Agents ls directories, read only relevant files, and execute locally. Anthropic's example moves a Drive doc to Salesforce using 2K tokens total (98% reduction from 150K). Benefits include pre-model data filtering via loops/conditionals, keeping sensitive info (emails, phones) out of context, and fewer model roundtrips. Requires sandbox with isolation/limits, but Cloudflare's code mode validates the pattern.

Tool search dynamically loads from catalogs using regex or BM25 ranking, like Claude file search. Add search tool, set non-essential tools to lazy-load (default_loading: true). Cuts 55K baseline by 85%, boosts accuracy beyond 30-50 tools where selection degrades.

Scoped loading groups similar tools (e.g., BrightData's 60 tools in 11 groups: e-commerce, finance). Specify via URL (?groups=...) or env var; load multiples per session. Pin exact tools (tools=tool1,tool2) for production—ideal post-discovery, loading 4/60 saves massively.

Dynamic context adds 3 levels: (1) list servers, (2) tool summaries per server, (3) full schema on demand. Pairs with groups for layered savings. BrightData skills (skill.md YAML+markdown) enable this across 40+ agents via Open Agent Skill Ecosystem.

Programmatic Calling & Architecture Keep Context Pristine

Programmatic tool calling lets Claude write Python to invoke tools; intermediates skip model context, only final output enters. Add code_execution tool, mark tools with allowed_callers: "code_execution". Unlocks agent benchmarks (browse, comp, deep search QA). Gap: no MCP support yet.

Layered servers split discovery/planning/execution into sub-agents. Orchestrator stays lean, passing inputs/receiving results—scales for many servers or team silos.

Output Tweaks Yield 30-60% Extra Savings

Strip markdown/formatting from web/doc results before model—smart systems handle plain text well. For Google search, parse top organics, drop ads/related (page-dependent savings).

TOON (Token Oriented Object Notation) declares keys once, streams CSV-like values. Beats JSON 30-60% on flat lists (e.g., 3 products: no repeated ID/name/price). Fails on nested data like profiles.

Stack for 98% Total: Groups + Search + Calling + Stripping

Combine: groups at connection, search for outliers, programmatic for multi-step, strip outputs, TOON tabulars. Code execution replaces calls entirely. All open-source; BrightData offers 5K free reqs/mo (MIT GitHub).

Summarized by x-ai/grok-4.1-fast via openrouter

6330 input / 1846 output tokens in 17693ms

© 2026 Edge