The Failure of Model-Driven Orchestration
Probabilistic systems (LLMs) are ill-suited for deterministic enterprise workflows like payment authorization or vendor onboarding. When models are tasked with orchestration—sequencing tool calls, managing state, and interpreting results—they introduce non-deterministic behavior, security risks, and audit failures. Furthermore, increasing model size or context windows does not solve these issues; it often exacerbates them by increasing the surface area for "lost in the middle" phenomena and tool-space interference, where adding more tools degrades performance.
Shifting the Execution Boundary
The core architectural solution is to move the execution loop out of the model and into a governed runtime. The author introduces "Lattice," an open framework that treats workflows as "capabilities"—typed Python contracts. In this model, the LLM acts only as an intent engine, requesting a high-level outcome (e.g., VendorOnboarding), while the runtime handles the complex, deterministic logic of sequencing, retries, and error recovery.
Capabilities as Governed Code
Capabilities are defined as Python files using decorators to manage the workflow lifecycle:
- Contract Definition:
@capabilitydefines the inputs and the "projection" (the small, decision-relevant data returned to the model). - Step Execution:
@stepfunctions define discrete units of work, allowing for dependency management (e.g., running independent steps in parallel) and granular failure policies (@retry,@soft_failure,@hard_failure). - Security & Audit: Credentials and permissions are managed by the runtime, not the model. By injecting scopes at the capability level, security reviews are simplified from auditing dozens of individual endpoints to auditing a single capability. Every execution generates a structured, queryable audit trail, replacing opaque conversation logs.
Designing for Decision Surfaces
A capability's value is determined by its "projection." Instead of passing raw API responses (which are sensitive and noisy) back to the model, the runtime filters data into a small, typed object. This projection must enable the model to explain outcomes, present clear alternatives upon failure, and map those alternatives to actionable next steps. This approach keeps sensitive data out of the model's context, mitigating risks like prompt injection and log exposure while significantly reducing token costs.