Three Multi-LLM Patterns: Chain, Parallel, Route

Chain LLMs sequentially for step-by-step refinement, run parallel calls for concurrent multi-input tasks, and route inputs to specialized prompts via classification—trading latency or cost for better accuracy.

Sequential Chaining Builds Complex Outputs Step-by-Step

Chain multiple LLM calls where each step refines the previous output, ideal for tasks needing progressive transformation like data extraction and formatting. This trades latency for precision since calls run one after another.

Implement with a simple loop:

def chain(input: str, prompts: list[str]) -> str:
    result = input
    for i, prompt in enumerate(prompts, 1):
        result = llm_call(f"{prompt}\nInput: {result}")
    return result

For a Q3 performance report, four chained prompts extract metrics (e.g., "92 points customer satisfaction"), convert to percentages ("92%: customer satisfaction"), sort descending, then format as a markdown table:

MetricValue
Customer Satisfaction92%
Employee Satisfaction87%
Product Adoption78%
Operating Margin34%
Revenue Growth45%
Market Share23%
Customer Churn5%

This breaks down intricate formatting that a single prompt might hallucinate or mishandle.

Parallel Execution Speeds Up Multi-Stakeholder Analysis

Run identical prompts on multiple inputs concurrently using ThreadPoolExecutor (default 3 workers), cutting total latency for independent tasks like impact analysis across groups.

Code:

def parallel(prompt: str, inputs: list[str], n_workers: int = 3) -> list[str]:
    with ThreadPoolExecutor(max_workers=n_workers) as executor:
        futures = [executor.submit(llm_call, f"{prompt}\nInput: {x}") for x in inputs]
        return [f.result() for f in futures]

Example analyzes market changes for customers (price-sensitive, tech-wanting), employees (job security), investors (growth-focused), and suppliers (capacity issues). Each gets tailored impacts and actions in parallel, e.g., for customers: highlight pricing strategies and eco-features. Without parallelism, this serializes to 4x longer; concurrency delivers all results near-simultaneously at higher API cost.

Routing Directs Inputs to Specialized Experts

Classify input content first, then route to a tailored prompt, improving relevance for varied tasks like support tickets. Adds upfront classification latency but leverages specialist personas for better outputs.

Router uses chain-of-thought in XML:

def route(input: str, routes: dict[str, str]) -> str:
    selector_prompt = f"""
    Analyze... select from {routes.keys()}
    <reasoning>Explanation</reasoning>
    <selection>Team</selection>
    Input: {input}"""
    route_key = extract_xml(llm_call(selector_prompt), "selection").strip().lower()
    return llm_call(f"{routes[route_key]}\nInput: {input}")

Routes: billing (acknowledge charges, steps), technical (numbered fixes), account (security-first), product (feature education).

Ticket examples:

  • Login fail → account: Verifies security, recovery steps.
  • Unexpected charge → billing: Explains discrepancy, adjustment timeline.
  • Data export → product: Step-by-step guide, docs links.

Routing reasoning cites keywords ("invalid password" → security urgency) and intent, avoiding generic responses.

Summarized by x-ai/grok-4.1-fast via openrouter

4991 input / 1548 output tokens in 11897ms

© 2026 Edge