Code-Driven Workflows Fix LLM Agent Flaws
For deterministic tasks like auto-adding Slack reactions to merged PRs, code scripts outperform LLMs by eliminating errors that mislead teams, while still allowing LLM subagents for intelligence.
Determinism Solves LLM Workflow Reliability Issues
LLMs excel at tool usage for complex tasks but fail on simple, repetitive ones requiring perfect accuracy. In a Slack channel for PR reviews, an LLM workflow scanned the last 10 messages, extracted single GitHub PR URLs, checked status via GitHub API, and added :merged: reactions to closed or merged PRs. It worked conceptually but erred by adding reactions to unmerged PRs, causing teams to skip valid reviews. This undermined the goal: quick visual triage without human intervention. Code-driven alternatives ensure 100% accuracy since they execute predefined logic without hallucination risks, making them cheaper and faster for rule-based automation.
Trade-off: Pure LLMs offer flexibility for novel scenarios but introduce non-determinism, eroding trust. Use code when rules are clear and errors costly.
Hybrid Config Enables Code or LLM Coordinators
Orchestrate workflows via a handler that selects configs based on triggers (e.g., Slack events). Default to coordinator: llm for prompt + tools + virtual files (like Jira attachments). Add coordinator: script with coordinator_script: scripts/pr_merged.py for custom Python.
Scripts access identical inputs—triggers, tools, virtual files—as LLMs, plus the subagent tool to invoke LLMs selectively. Engineers write/review these via PRs, enabling dependencies or logic tweaks. Handler skips LLM orchestration, running code directly until termination.
This preserves LLM power (e.g., subagents with full tools) inside reliable code shells, avoiding excessive tool loops via built-in limits.
Code as Progressive Enhancement Boosts Workflow Speed
Start with LLM configs for quick iteration—they handle many cases. Rewrite flaky ones to code using Claude, which converts prompts to scripts in one shot. Result: Code for frequent, error-prone tasks; LLMs for intelligence needs. Even as models improve, narrow LLM use preserves determinism where it matters, forming a robust toolkit for internal agents.