Three Multi-LLM Patterns: Chain, Parallel, Route
Chain LLMs sequentially for step-by-step refinement, run parallel calls for concurrent multi-input tasks, and route inputs to specialized prompts via classification—trading latency or cost for better accuracy.
Sequential Chaining Builds Complex Outputs Step-by-Step
Chain multiple LLM calls where each step refines the previous output, ideal for tasks needing progressive transformation like data extraction and formatting. This trades latency for precision since calls run one after another.
Implement with a simple loop:
def chain(input: str, prompts: list[str]) -> str:
result = input
for i, prompt in enumerate(prompts, 1):
result = llm_call(f"{prompt}\nInput: {result}")
return result
For a Q3 performance report, four chained prompts extract metrics (e.g., "92 points customer satisfaction"), convert to percentages ("92%: customer satisfaction"), sort descending, then format as a markdown table:
| Metric | Value |
|---|---|
| Customer Satisfaction | 92% |
| Employee Satisfaction | 87% |
| Product Adoption | 78% |
| Operating Margin | 34% |
| Revenue Growth | 45% |
| Market Share | 23% |
| Customer Churn | 5% |
This breaks down intricate formatting that a single prompt might hallucinate or mishandle.
Parallel Execution Speeds Up Multi-Stakeholder Analysis
Run identical prompts on multiple inputs concurrently using ThreadPoolExecutor (default 3 workers), cutting total latency for independent tasks like impact analysis across groups.
Code:
def parallel(prompt: str, inputs: list[str], n_workers: int = 3) -> list[str]:
with ThreadPoolExecutor(max_workers=n_workers) as executor:
futures = [executor.submit(llm_call, f"{prompt}\nInput: {x}") for x in inputs]
return [f.result() for f in futures]
Example analyzes market changes for customers (price-sensitive, tech-wanting), employees (job security), investors (growth-focused), and suppliers (capacity issues). Each gets tailored impacts and actions in parallel, e.g., for customers: highlight pricing strategies and eco-features. Without parallelism, this serializes to 4x longer; concurrency delivers all results near-simultaneously at higher API cost.
Routing Directs Inputs to Specialized Experts
Classify input content first, then route to a tailored prompt, improving relevance for varied tasks like support tickets. Adds upfront classification latency but leverages specialist personas for better outputs.
Router uses chain-of-thought in XML:
def route(input: str, routes: dict[str, str]) -> str:
selector_prompt = f"""
Analyze... select from {routes.keys()}
<reasoning>Explanation</reasoning>
<selection>Team</selection>
Input: {input}"""
route_key = extract_xml(llm_call(selector_prompt), "selection").strip().lower()
return llm_call(f"{routes[route_key]}\nInput: {input}")
Routes: billing (acknowledge charges, steps), technical (numbered fixes), account (security-first), product (feature education).
Ticket examples:
- Login fail → account: Verifies security, recovery steps.
- Unexpected charge → billing: Explains discrepancy, adjustment timeline.
- Data export → product: Step-by-step guide, docs links.
Routing reasoning cites keywords ("invalid password" → security urgency) and intent, avoiding generic responses.