Agentic Pipelines: Cache Keys Cut Token Bloat 95%

Intercept tool calls with a ToolOrchestrator that swaps cache keys for large datasets, keeping LLM context to metadata only—avoids 50k-token ping-pong, slashes latency and costs by 95%, frees model for pure reasoning.

Ditch Naive Tool Chaining to Stop Token Ping-Pong

Naive agent setups dump raw data like 50,000-token SQL results directly into the LLM context, then pass it back out to Python sandboxes and chart engines—repeating this 3+ times per turn. This chaining anti-pattern causes latency spikes to minutes, context overflows with max_tokens_exceeded errors, and cognitive degradation where the LLM hallucinates from data overload instead of reasoning.

The fix: Treat the LLM as a traffic director, not a data courier. Use session cache to store heavy payloads (e.g., 50MB datasets) and pass only lightweight cache keys (10 tokens) like 'cache_key_alpha'. Tools fetch data internally via keys, process it, and return new keys—never bloating the prompt.

Result: Context stays under dozens of metadata tokens per turn, eliminating re-transmissions of the same dataset.

Orchestrate Data Flow: SQL → Compute → Charts via Keys

Rewire the pipeline so tools communicate through cache without LLM involvement:

  1. Agent calls execute_sql; middleware caches raw data, returns 'SUCCESS: Raw data saved to cache_key_alpha'.
  2. Agent generates Python script, calls execute_python(script, input_key='cache_key_alpha'); sandbox pulls data internally, runs regression, caches output as cache_key_beta, returns success pointer.
  3. Agent calls generate_chart(input_key='cache_key_beta'); engine renders UI from cache, delivers directly to user.

Benefits stack: Token costs drop over 95% (strings vs. spreadsheets), TTFT shrinks to near-instant, model focuses 100% on planning/synthesis since it skips JSON parsing.

This mirrors CS pass-by-reference: functions share memory addresses, not copies—scales to enterprise queries without collapse.

Build ToolOrchestrator: Middleware for Pointer Magic

Centralize with a ToolOrchestrator class that intercepts every tool call:

  • Resolve pointers: Scan args for 'cache_key_*' strings, swap for raw cache data before execution.
  • Execute + intercept: For execute_python, inject resolved data, run code, cache output, return tiny pointer like 'SUCCESS: Results saved to cache_key_sandbox_output'.
  • Handle charts: Resolve data_json from key, generate UI payload.

Key code:

import json
from typing import Dict, Any

class ToolOrchestrator:
    def __init__(self, cache_manager, sandbox, chart_engine):
        self.cache = cache_manager
        self.sandbox = sandbox
        self.chart = chart_engine

    def execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
        resolved_args = self._resolve_pointers(arguments)
        if tool_name == "execute_python":
            raw_output = self.sandbox.execute_python(
                code_string=resolved_args.get("script"),
                injected_data=resolved_args.get("data")
            )
            new_key = "cache_key_sandbox_output"
            self.cache.write_to_cache(new_key, raw_output)
            return f"SUCCESS: Python execution complete. Results saved to '{new_key}'."
        elif tool_name == "generate_chart":
            ui_payload = self.chart.generate_chart(
                data_json=resolved_args.get("data"),
                chart_type=resolved_args.get("chart_type"),
                title="Requested Analysis"
            )
            return "SUCCESS: Chart rendered."

    def _resolve_pointers(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
        resolved = {}
        for key, value in arguments.items():
            if isinstance(value, str) and value.startswith("cache_key_"):
                resolved[key] = self.cache.read_raw_data(value)
            else:
                resolved[key] = value
        return resolved

Integrate with prior cache/sandbox/chart tools from the Cognitive Agent Architecture series for full enterprise agents.

Summarized by x-ai/grok-4.1-fast via openrouter

5727 input / 1732 output tokens in 13855ms

© 2026 Edge