Three Multi-LLM Patterns: Chain, Parallel, Route

Sequential Chaining Builds Complex Outputs Step-by-Step

Chain multiple LLM calls where each step refines the previous output, ideal for tasks needing progressive transformation like data extraction and formatting. This trades latency for precision since calls run one after another.

Implement with a simple loop:

def chain(input: str, prompts: list[str]) -> str:
    result = input
    for i, prompt in enumerate(prompts, 1):
        result = llm_call(f"{prompt}\nInput: {result}")
    return result

For a Q3 performance report, four chained prompts extract metrics (e.g., "92 points customer satisfaction"), convert to percentages ("92%: customer satisfaction"), sort descending, then format as a markdown table:

Metric	Value
Customer Satisfaction	92%
Employee Satisfaction	87%
Product Adoption	78%
Operating Margin	34%
Revenue Growth	45%
Market Share	23%
Customer Churn	5%

This breaks down intricate formatting that a single prompt might hallucinate or mishandle.

Parallel Execution Speeds Up Multi-Stakeholder Analysis

Run identical prompts on multiple inputs concurrently using ThreadPoolExecutor (default 3 workers), cutting total latency for independent tasks like impact analysis across groups.

Code:

def parallel(prompt: str, inputs: list[str], n_workers: int = 3) -> list[str]:
    with ThreadPoolExecutor(max_workers=n_workers) as executor:
        futures = [executor.submit(llm_call, f"{prompt}\nInput: {x}") for x in inputs]
        return [f.result() for f in futures]

Example analyzes market changes for customers (price-sensitive, tech-wanting), employees (job security), investors (growth-focused), and suppliers (capacity issues). Each gets tailored impacts and actions in parallel, e.g., for customers: highlight pricing strategies and eco-features. Without parallelism, this serializes to 4x longer; concurrency delivers all results near-simultaneously at higher API cost.

Routing Directs Inputs to Specialized Experts

Classify input content first, then route to a tailored prompt, improving relevance for varied tasks like support tickets. Adds upfront classification latency but leverages specialist personas for better outputs.

Router uses chain-of-thought in XML:

def route(input: str, routes: dict[str, str]) -> str:
    selector_prompt = f"""
    Analyze... select from {routes.keys()}
    <reasoning>Explanation</reasoning>
    <selection>Team</selection>
    Input: {input}"""
    route_key = extract_xml(llm_call(selector_prompt), "selection").strip().lower()
    return llm_call(f"{routes[route_key]}\nInput: {input}")

Routes: billing (acknowledge charges, steps), technical (numbered fixes), account (security-first), product (feature education).

Ticket examples:

Login fail → account: Verifies security, recovery steps.
Unexpected charge → billing: Explains discrepancy, adjustment timeline.
Data export → product: Step-by-step guide, docs links.

Routing reasoning cites keywords ("invalid password" → security urgency) and intent, avoiding generic responses.