SPDD: Scale LLM Coding to Teams via Structured Prompts
Structured Prompt-Driven Development (SPDD) treats prompts as versioned artifacts using a REASONS canvas and workflow to make AI-generated code governable, reviewable, and reusable across teams.
Prompts as First-Class Artifacts to Bridge Individual and Team Gains
AI coding assistants boost individual developer speed, but teams face friction from ambiguous requirements turning into scaled misunderstandings, harder reviews, integration issues, and production risks. Thoughtworks' internal IT teams developed Structured Prompt-Driven Development (SPDD) to make LLM-assisted changes governable at scale. Instead of ad hoc chats, SPDD elevates prompts to version-controlled assets alongside code, capturing requirements, domain models, design intent, constraints, and tasks. This predictability enables reviews on a single artifact, not scattered logs or diffs.
The core problem: Local speed ("Ferrari engine") doesn't fix systemic roads like poor alignment. SPDD rejects freeform prompting for a structured approach, drawing from spec-driven development but evolving prompts as living specs that co-evolve with code. When code diverges, update the prompt first—enforcing a closed loop where feedback refines intent before implementation.
"It's like buying a Ferrari and driving it on muddy roads: the engine is powerful, but your arrival time is determined by road conditions and traffic." This analogy from the authors highlights why individual AI wins fail organizationally without governance.
REASONS Canvas: Abstract Intent Before Concrete Execution
SPDD's foundation is the REASONS Canvas, a seven-part prompt structure forcing clarity on intent, design, execution, and governance before code generation.
| Section | Focus | Why It Matters |
|---|---|---|
| R - Requirements | Problem, Definition of Done | Aligns on business value and success metrics. |
| E - Entities | Domain model, relationships | Grounds in shared domain language. |
| A - Approach | High-level strategy | Sets solution direction with trade-offs. |
| S - Structure | System fit, components, deps | Ensures architectural consistency. |
| O - Operations | Task breakdown, testable steps | Makes execution concrete and verifiable. |
| N - Norms | Naming, observability, coding standards | Enforces team conventions. |
| S - Safeguards | Invariants, perf limits, security | Prevents regressions. |
Abstract sections (R,E,A,S) capture design before specifics; execution (O) follows; governance (N,S) bounds output. This shifts uncertainty left, compounding expertise across iterations into reusable libraries. Reviewers validate one canvas, not code alone.
Decision chain: Teams considered ad hoc vs. structured prompts. Chose REASONS because it balances expressiveness with consistency—too vague risks hallucination; too rigid stifles creativity. Trade-off: Upfront canvas time (10-30 mins) pays off in predictable generations and fewer review cycles.
Closed-Loop Workflow Powered by openspdd CLI
SPDD integrates prompts into git workflows via openspdd, a CLI tool with commands enforcing discipline:
| Command | Purpose | Key Benefit |
|---|---|---|
spdd-story | Split requirements into INVEST stories | Manages large epics. |
spdd-analysis | Extract domain keywords, scan code, analyze risks | Contextualizes without full codebase dump. |
spdd-reasons-canvas | Build full canvas from analysis | Generates executable blueprint. |
spdd-generate | Produce code task-by-task per canvas | Bounded, reproducible output. |
spdd-api-test | Curl-based E2E tests | Verifies ACs. |
spdd-prompt-update | Evolve canvas on req changes | Req → prompt → code. |
spdd-sync | Back-propagate code changes to canvas | Code → prompt sync. |
Workflow: Requirements → Analysis → Canvas → Code → Tests → Review → Commit. Rule: Divergence? Fix prompt first. This creates short feedback loops within iterations and cumulative context across them, turning prompts into a library.
Compared to spec-driven dev, SPDD adds governance via versioned prompts and sync mechanisms. Trade-offs: Tool overhead for small changes (skip for trivial); shines on enhancements where context matters.
"When reality diverges, fix the prompt first — then update the code." This rule from the workflow prevents intent drift, making SPDD a true closed loop.
Billing Engine Enhancement: From Static to Dynamic Pricing
Example: Enhance a token-based LLM billing engine (GitHub: token-billing, iteration-1 baseline) for model-aware, multi-plan billing.
Before: Single global rate, quota for all.
Opportunity: User feedback demands model-specific rates (e.g., fast-model $0.01/1K tokens, reasoning-model $0.03/1K), Standard plan (quota + overage), Premium (no quota, split prompt/completion billing).
Options considered: Monolith if-else vs. extensible patterns. Rejected tight coupling; chose Strategy/Factory for plans, respecting ISP/SRP.
Step chain:
/spdd-storyon enhancement idea → Two stories (Standard + Premium), consolidated to one with Given/When/Then ACs (e.g., Standard overage: 100K quota, 90K used, 30K fast-model → $0.20 charge).- Manual clarify: Core logic (routing by plan), scope (calc only, no CRUD), DoD (4 scenarios).
/spdd-analysis→ Domain concepts (new: modelId, plans), risks (edge cases like negative tokens), strategy (Strategy pattern).- Review: Aligned on OOP principles; surfaced extra edges (e.g., unknown models → 404).
/spdd-reasons-canvas→ Full prompt with REASONS./spdd-generate→ Code: Added modelId validation, ModelRateRepository, PlanStrategyFactory, Standard/PremiumStrategy impls./spdd-api-test→ Curl tests for ACs.
Results: API now handles modelId, dynamic rates, plan-specific logic. Quota exhausts correctly; Premium bills splits (e.g., 10K prompt + 20K completion reasoning → $1.50). Extensible for future plans.
Trade-offs: Factory adds indirection (minor perf hit, justified by extensibility); analysis review caught risks early.
Repo diffs show full artifacts: prompts, code, tests. Replicable in ~1 hour.
"The AI's analysis largely aligned with our architectural intent; in fact, its considerations were even more comprehensive than ours in certain areas." Review insight shows AI augmenting human foresight.
Three Core Skills: Alignment, Abstraction-First, Iterative Review
Effectiveness demands:
- Alignment: Review analysis/canvas against human understanding; catch misalignments early.
- Abstraction-first: Define intent/design before ops; prevents premature optimization.
- Iterative review: Treat prompts as code—peer review, refine on divergence.
These counter LLM non-determinism, turning variability into strength via governance.
"Reviews move away from 'spot the bug' toward 'check the intent.'" Captures SPDD's review shift.
Fitness and Trade-offs: Not for Every Change
Assess fit: High-context enhancements, teams new to AI (builds discipline), domains with reusable patterns. Skip for one-liners.
Trade-offs:
- Pros: Consistency, reuse, safer scaling.
- Cons: Prompt overhead (5-20% more upfront), learning curve, tool dependency.
Fits AI-First Software Delivery; breaks "expert-only" barrier by codifying expertise.
Key Takeaways
- Treat prompts as git-tracked artifacts to scale AI beyond solo devs.
- Use REASONS Canvas: Abstract (REAS) → Execute (O) → Govern (NS).
- Enforce 'fix prompt first' on divergence for closed-loop evolution.
- Leverage openspdd CLI for workflow: analysis → canvas → generate → sync.
- Review at analysis/canvas stages; abstraction-first uncovers edges.
- Ideal for enhancements: e.g., add model-aware billing via Strategy pattern.
- Builds prompt libraries compounding team knowledge.
- Trade-off: Upfront structure for downstream predictability.
- Core skills: Align intents, abstract before code, iterate reviews.