SPDD: Scale LLM Coding to Teams via Structured Prompts

Prompts as First-Class Artifacts to Bridge Individual and Team Gains

AI coding assistants boost individual developer speed, but teams face friction from ambiguous requirements turning into scaled misunderstandings, harder reviews, integration issues, and production risks. Thoughtworks' internal IT teams developed Structured Prompt-Driven Development (SPDD) to make LLM-assisted changes governable at scale. Instead of ad hoc chats, SPDD elevates prompts to version-controlled assets alongside code, capturing requirements, domain models, design intent, constraints, and tasks. This predictability enables reviews on a single artifact, not scattered logs or diffs.

The core problem: Local speed ("Ferrari engine") doesn't fix systemic roads like poor alignment. SPDD rejects freeform prompting for a structured approach, drawing from spec-driven development but evolving prompts as living specs that co-evolve with code. When code diverges, update the prompt first—enforcing a closed loop where feedback refines intent before implementation.

"It's like buying a Ferrari and driving it on muddy roads: the engine is powerful, but your arrival time is determined by road conditions and traffic." This analogy from the authors highlights why individual AI wins fail organizationally without governance.

REASONS Canvas: Abstract Intent Before Concrete Execution

SPDD's foundation is the REASONS Canvas, a seven-part prompt structure forcing clarity on intent, design, execution, and governance before code generation.

Section	Focus	Why It Matters
R - Requirements	Problem, Definition of Done	Aligns on business value and success metrics.
E - Entities	Domain model, relationships	Grounds in shared domain language.
A - Approach	High-level strategy	Sets solution direction with trade-offs.
S - Structure	System fit, components, deps	Ensures architectural consistency.
O - Operations	Task breakdown, testable steps	Makes execution concrete and verifiable.
N - Norms	Naming, observability, coding standards	Enforces team conventions.
S - Safeguards	Invariants, perf limits, security	Prevents regressions.

Abstract sections (R,E,A,S) capture design before specifics; execution (O) follows; governance (N,S) bounds output. This shifts uncertainty left, compounding expertise across iterations into reusable libraries. Reviewers validate one canvas, not code alone.

Decision chain: Teams considered ad hoc vs. structured prompts. Chose REASONS because it balances expressiveness with consistency—too vague risks hallucination; too rigid stifles creativity. Trade-off: Upfront canvas time (10-30 mins) pays off in predictable generations and fewer review cycles.

Closed-Loop Workflow Powered by openspdd CLI

SPDD integrates prompts into git workflows via openspdd, a CLI tool with commands enforcing discipline:

Command	Purpose	Key Benefit
`spdd-story`	Split requirements into INVEST stories	Manages large epics.
`spdd-analysis`	Extract domain keywords, scan code, analyze risks	Contextualizes without full codebase dump.
`spdd-reasons-canvas`	Build full canvas from analysis	Generates executable blueprint.
`spdd-generate`	Produce code task-by-task per canvas	Bounded, reproducible output.
`spdd-api-test`	Curl-based E2E tests	Verifies ACs.
`spdd-prompt-update`	Evolve canvas on req changes	Req → prompt → code.
`spdd-sync`	Back-propagate code changes to canvas	Code → prompt sync.

Workflow: Requirements → Analysis → Canvas → Code → Tests → Review → Commit. Rule: Divergence? Fix prompt first. This creates short feedback loops within iterations and cumulative context across them, turning prompts into a library.

Compared to spec-driven dev, SPDD adds governance via versioned prompts and sync mechanisms. Trade-offs: Tool overhead for small changes (skip for trivial); shines on enhancements where context matters.

"When reality diverges, fix the prompt first — then update the code." This rule from the workflow prevents intent drift, making SPDD a true closed loop.

Billing Engine Enhancement: From Static to Dynamic Pricing

Example: Enhance a token-based LLM billing engine (GitHub: token-billing, iteration-1 baseline) for model-aware, multi-plan billing.

Before: Single global rate, quota for all.

Opportunity: User feedback demands model-specific rates (e.g., fast-model $0.01/1K tokens, reasoning-model $0.03/1K), Standard plan (quota + overage), Premium (no quota, split prompt/completion billing).

Options considered: Monolith if-else vs. extensible patterns. Rejected tight coupling; chose Strategy/Factory for plans, respecting ISP/SRP.

Step chain:

/spdd-story on enhancement idea → Two stories (Standard + Premium), consolidated to one with Given/When/Then ACs (e.g., Standard overage: 100K quota, 90K used, 30K fast-model → $0.20 charge).
Manual clarify: Core logic (routing by plan), scope (calc only, no CRUD), DoD (4 scenarios).
/spdd-analysis → Domain concepts (new: modelId, plans), risks (edge cases like negative tokens), strategy (Strategy pattern).
- Review: Aligned on OOP principles; surfaced extra edges (e.g., unknown models → 404).
/spdd-reasons-canvas → Full prompt with REASONS.
/spdd-generate → Code: Added modelId validation, ModelRateRepository, PlanStrategyFactory, Standard/PremiumStrategy impls.
/spdd-api-test → Curl tests for ACs.

Results: API now handles modelId, dynamic rates, plan-specific logic. Quota exhausts correctly; Premium bills splits (e.g., 10K prompt + 20K completion reasoning → $1.50). Extensible for future plans.

Trade-offs: Factory adds indirection (minor perf hit, justified by extensibility); analysis review caught risks early.

Repo diffs show full artifacts: prompts, code, tests. Replicable in ~1 hour.

"The AI's analysis largely aligned with our architectural intent; in fact, its considerations were even more comprehensive than ours in certain areas." Review insight shows AI augmenting human foresight.

Three Core Skills: Alignment, Abstraction-First, Iterative Review

Effectiveness demands:

Alignment: Review analysis/canvas against human understanding; catch misalignments early.
Abstraction-first: Define intent/design before ops; prevents premature optimization.
Iterative review: Treat prompts as code—peer review, refine on divergence.

These counter LLM non-determinism, turning variability into strength via governance.

"Reviews move away from 'spot the bug' toward 'check the intent.'" Captures SPDD's review shift.

Fitness and Trade-offs: Not for Every Change

Assess fit: High-context enhancements, teams new to AI (builds discipline), domains with reusable patterns. Skip for one-liners.

Trade-offs:

Pros: Consistency, reuse, safer scaling.
Cons: Prompt overhead (5-20% more upfront), learning curve, tool dependency.

Fits AI-First Software Delivery; breaks "expert-only" barrier by codifying expertise.

Key Takeaways

Treat prompts as git-tracked artifacts to scale AI beyond solo devs.
Use REASONS Canvas: Abstract (REAS) → Execute (O) → Govern (NS).
Enforce 'fix prompt first' on divergence for closed-loop evolution.
Leverage openspdd CLI for workflow: analysis → canvas → generate → sync.
Review at analysis/canvas stages; abstraction-first uncovers edges.
Ideal for enhancements: e.g., add model-aware billing via Strategy pattern.
Builds prompt libraries compounding team knowledge.
Trade-off: Upfront structure for downstream predictability.
Core skills: Align intents, abstract before code, iterate reviews.