Scale MCP Servers: 40 Tools, 95% Success, Stateless Redis

Reduce context 49% with 40 default tools grouped by CRUD; encode agent intent server-side for 95% success and fewer roundtrips; use OAuth/PKCE over PATs; run stateless per-request instances with Redis sessions handling 7M calls/week.

Optimize Tools and Context to Fix Agent Confusion

More tools degrade agent performance—LangChain's February research showed agents get confused and forgetful when shoving too many into context. GitHub cut initial context load 49% by focusing tools on common usage patterns, then grouped CRUD operations to reach ~40 default tools users can expand/contract. Output tokens dropped 75%+ by tailoring responses (e.g., concise PR lists). Anti-pattern: relying on user config—everyone uses defaults, so servers must curate aggressively. Run evals on tool pools to ensure right tool called at right time, avoiding over- or under-use. Result: agents succeed more without micro-optimizing descriptions.

Encode agent intent server-side: make 5 API calls internally to robustify tools, cutting failures, roundtrips, and context waste. Agents still hallucinate repo write perms, but success hit >95%. Read-only mode used by 17% maps to spec hint, but clients rarely expose it—easy enterprise win.

Prioritize Secure Auth and Scoped Tools Over Tokens

Plain-text PATs are abused: long-lived, over-privileged, agent-accessible. Push OAuth 2.1 with PKCE (GitHub added support) as path of least resistance—no local runtime needed. Reject dynamic client registration: unbounded app DB growth, rate limit bucketing issues, no reliable identity. Future: client ID metadata for easier logins.

Filter tools by token scopes automatically—no user action. OAuth step-up challenges scopes interactively (VS Code supports), preventing failures on clean installs. Strip user-specific tools for server tokens (e.g., Actions), reducing failures/context. Prompt injection exfils (Invariant Labs demo, Simon Willson's 'lethal trifecta') hit any agent setup—utility vs. protection unsolved, especially across air-gapped Enterprise to full-token collab repos.

Build Stateless, Horizontally Scalable Servers

Run fully stateless: create new SDK server instance per request, dynamically add allowed tools based on config/policies. Use Redis for minimal session storage (client identity only, no affinity). Handles 7M tool calls/week, approaching 8M, with standard observability stack.

Insiders mode flags experiments like MCP apps for human-in-loop (e.g., edit AI-generated issues before posting). Stats: 11M+ Docker downloads (stdio), 30k stars, 4k forks, 126 contributors, 2.3k issues/PRs (>7/day). Open-source local MCP (April last year) sparked buzz, most-starred repo that week.

Future: auto server discovery, compositional tools (piping, streaming like bash/Cloudflare code mode), tool search APIs (Anthropic Claude, OpenAI). Thousands of tools viable soon via autonomy; experiment with harnesses like Pi or MCP CLIs for read-only.

Summarized by x-ai/grok-4.1-fast via openrouter

7719 input / 1707 output tokens in 15148ms

© 2026 Edge