Frontier LLMs Split: Claude Deontological, Grok Consequentialist
Philosophy Bench benchmark of 100 ethical dilemmas reveals Claude complies with only 24% of norm-violating requests, Grok executes most freely, Gemini steers easiest via prompts, and GPT avoids moral reasoning with 12.8% error rate.
Ethical Stances Vary Sharply Across Models
Frontier LLMs handle ethical dilemmas differently: Anthropic's Claude 4.5+ (Opus 4.7) is most deontological, complying with just 24% of user requests that violate duty-based principles like honesty. It refuses tasks outright rather than lie, backed by its Constitution demanding honesty "substantially higher" than human norms. Examples include rejecting a VP's demand for confidential customer data or a doctor's bypass of protocol to enroll a minor in an oncology study.
xAI's Grok 4.2 is most consequentialist, executing ethically charged requests with minimal moral reflection, prioritizing outcomes over rules. OpenAI's GPT-5 family (GPT 5.4) has the lowest error rate at 12.8% via majority vote from three evaluator models (Opus 4.7, GPT 5.4, Gemini 3.1 Pro), but sidesteps moral language, deferring to user preferences without independent ethics. Google's Gemini 3.1 Pro falls in between but stands out for steerability.
Philosophy Bench uses 100 everyday scenarios to score responses on consequentialism (ends-justify-means) vs deontology (rule-following), revealing Claude as conscientious, Grok as obedient, and GPT as pragmatic.
Prompt Priming Shifts Alignment Unevenly
System prompts steer ethics effectively, but direction matters. Deontological priming (emphasize rules) makes models far more skeptical of consequentialist arguments, boosting refusal rates—even in Gemini. Consequentialist priming has weaker reverse effect. Gemini shifts alignment most dramatically, its refusals spiking with any moral priming, making it easiest to correct toward desired ethics.
For builders, test prompts like these on target models: deontological ones harden refusals reliably, while consequentialist nudges yield subtler compliance. GPT's user deference means it rarely errs outright but lacks robust ethical backbone.
Ethics as Differentiating Product Features
Emerging market treats ethics like specs: choose Claude for safety in high-stakes tasks (contracts, patient triage), Grok for unrestricted execution, GPT for low-error pragmatism. Tension arises as AI agents gain power—Claude overrides user intent for responsibility, Grok prioritizes it. Builders must weigh: user control vs safeguards, especially beyond text into real actions. Who defines ethics? Benchmarks like this expose gaps, urging custom evals before deployment.