Tiltgent CLI Profiles AI Agent Judgment Tilt via Blind Debates

Judgment tilt captures an AI agent's systematic preference for one well-argued position over another in blind comparisons, driven by training, RLHF, and prompts. Even vanilla models show tilt, like -0.50 on Stability and -0.40 on Tradition in early tests. Tiltgent generates 10 escalating debate rounds from a topic, pitting arguments from 21 worldview archetypes positioned on five axes: Order↔Emergence, Humanist↔Systems-first, Stability↔Dynamism, Local agency↔Coordinated scale, Tradition↔Reinvention.

Archetypes pair via Euclidean distance for ideological separation, each with unique system prompts, rhetorical moves, accusations, and vocabulary to avoid overlap. Your agent judges blindly (no labels), picks winners 3x per round for consensus (pick agreement rate like 0.93, unstable rounds like 1), and subtracts a vanilla baseline run to isolate your prompt's effect. Output: JSON profile with dimension scores (e.g., order_emergence: 0.65), contradiction lines (e.g., "You champion market forces... but go cold when they threaten human welfare"), and stability metrics.

Run npx tiltgent eval --prompt your-agent.txt --topic "Universal basic income" for a 5-minute eval (~$0.25–0.30 Anthropic API cost). Use tiltgent diff for instant profile comparisons, tiltgent inspect for terminal views. MIT-licensed, 3 deps, bring your API key.

Archetype Calibration Prevents Style Over Substance Bias

21 archetypes underwent triple audits (ChatGPT, Gemini, Grok): 14 vector fixes, 11 prompt sharpenings, 2 merges (indistinguishable in blind tests), 3 additions for gaps. Universal debate prompts enforce substance focus, countering prose dominance—without it, dramatic styles win regardless of worldview.

Synthetic validation: 4 agents (Hard Accelerationist, Cautious Humanist, etc.) on 2 topics at temp=0 showed stable picks, 0.93 axis separation (Humanist vs Systems), topic-varying baseline tilt mandating per-topic calibration. Self-preference reduced via baseline subtraction, though Anthropic models generate and judge (multi-model support next).

Full roster and prompts public in repo—audit yourself.

Prompt Testing and Diagnostics Drive Production Use

Test prompt changes: eval before/after, diff shows dimension shifts (e.g., Humanist↔Systems). Profile cross-topics (balanced on healthcare? Market-tilt on economics?). Compare models same-prompt. Pre-deploy: Inspect summarizers/triers for argumentative leanings.

Reveals preferences under pick pressure—beats direct opinion queries yielding hedges. Not moral bias label or fact-check; assumes competent arguments, measures value tilts (e.g., libertarian agents favor markets, health agents favor coordination).

Rhetorical Balance Remains Open Challenge

Archetypes aren't perfectly persuasive-equal—one won 4/4 matchups via "second-order consequences" authority. Per-topic baseline mitigates but doesn't eliminate. v0.1 unproven on production agents, non-Anthropic targets (GPT-4, etc.), or open models—engine model-agnostic, validation pending.

Blind Debates Quantify Judgment Tilt Across 5 Axes

Archetype Calibration Prevents Style Over Substance Bias

Prompt Testing and Diagnostics Drive Production Use

Rhetorical Balance Remains Open Challenge