5 Prompt Techniques for Reliable LLM Outputs
Role-specific personas, negative constraints, JSON schemas, ARQ checklists, and verbalized sampling make LLM prompts produce consistent, structured results without fine-tuning or model changes.
Condition Model Responses with Personas and Constraints
Assign domain-specific roles in the system prompt to filter the model's knowledge and shift framing toward expert priorities. For a web app storing session tokens in localStorage, a generic assistant notes XSS risks and tradeoffs, but a 'senior application security researcher specializing in web authentication vulnerabilities' frames it as an attack surface: attackers steal tokens via XSS to hijack sessions, referencing OWASP guidelines and recommending HttpOnly cookies.
Combine roles with negative prompting to eliminate RLHF-induced noise like hedging, analogies, filler phrases ('great question'), and redundant summaries. Prompt a 'senior backend engineer writing internal documentation' with rules: no marketing language, resolve 'it depends' immediately, limit analogies to one sentence if needed, and stop after making the point. For explaining database indexes, this cuts verbose baseline (with headers, analogies, conclusion) to concise facts: indexes speed queries on WHERE/JOIN/ORDER BY clauses via B-trees, use on high-cardinality filtered columns, avoid on low-cardinality or write-heavy tables.
Enforce Parseable Structures with JSON and ARQ
Define exact JSON schemas in prompts to constrain outputs for code consumption, eliminating inconsistent free-form text. For product review parsing, specify schema with 'overall_sentiment' (positive/negative/mixed), 'rating' (1-5 integer), 'pros'/'cons' arrays, 'recommended_for'/'not_recommended_for' strings. System prompt: 'You MUST return only a valid JSON object. No preamble, no explanation.' Baseline mixes pros/cons in narrative; JSON yields {'overall_sentiment': 'mixed', 'rating': 3, 'pros': 'Stunning display', 'Comfortable keyboard', 'cons': 'Poor battery life (6-hour workday)', 'Aggressive fan noise', 'recommended_for': 'Light work users', 'not_recommended_for': 'Heavy software runners'}. Parse directly with json.loads() for storage/querying.
Attentive Reasoning Queries (ARQ) impose ordered, domain-specific checklists to cover all angles, surpassing unstructured chain-of-thought. For code review of unsafe SQL (f"SELECT * FROM users WHERE id = {user_id}"), list Q1-Security (SQL injection via unsanitized user_id), Q2-Error handling (unhandled db.execute() exception crashes), Q3-Performance (SELECT * fetches unnecessary columns, scales poorly), Q4-Correctness (result0 assumes single row, fails multi-row), Q5-Fix (parameterized query, SELECT specific columns, fetchone(), error handling). Baseline drifts; ARQ delivers systematic analysis and fixed code:
def get_user(user_id):
try:
query = "SELECT id, username, email FROM users WHERE id = %s"
result = db.execute(query, (user_id,))
return dict(result.fetchone()) if result.fetchone() else None
except Exception:
return None
Generate Multiple Hypotheses to Reveal Uncertainty
Verbalized sampling prompts for 3+ ranked hypotheses with confidence (0.0-1.0), failure modes, validation info, and agent action, countering single confident outputs. For support ticket ('can't log in, password reset email missing'), baseline picks one issue; verbalized lists: 1. Email Delivery (0.85: no email arrives; confirm spam/DNS), 2. Account State (0.70: new account locked; check flags), 3. Authentication (0.40: bad creds; verify recent login). Recommends: Ask for email provider and check spam. This aids prioritization without ensemble sampling.