Stronger AI Agents Win Deals, Losers Stay Blind
Claude Opus agents closed 2 more deals and got $3.64 higher prices than Haiku in Anthropic's marketplace experiment, but users rated fairness identically (4.05/7), hiding inequalities.
Stronger Models Dominate Negotiations Without Prompt Tweaks
Anthropic's Project Deal pitted Claude Opus 4.5 against Haiku 4.5 in four parallel Slack-based marketplaces for 69 employees, each with $100 budgets to buy/sell real items like snowboards and ping-pong balls. Agents handled everything—listings, offers, haggling—after initial human interviews shaped custom prompts.
Opus consistently outperformed: in mixed runs, Opus users closed ~2 more deals on average; same items fetched $3.64 more via Opus sellers ($2.68 premium overall across 161 repeated items). Examples: lab ruby sold for $65 (Opus, opened $60 amid bidding) vs. $35 (Haiku, started $40 and dropped); broken bike $65 vs. $38. Opus buyers paid $2.45 less. Opus-vs-Haiku deals averaged $24.18 vs. $18.63 Opus-on-Opus (median $12, mean $20.05). Negotiation styles (aggressive vs. friendly) showed no edge—higher prices from 'aggressive' stemmed from higher openings, not tactics.
Pure Opus run: 186 deals, $4,000 volume, baseline fairness 4/7. Use frontier models for AI commerce to capture value; weaker ones erode margins invisibly in mixed markets.
Perception Gap Masks Worse Outcomes
Haiku users rated deal fairness at 4.06/7, nearly identical to Opus's 4.05/7, with no satisfaction difference. Of 28 using both, 17 preferred Opus but 11 favored Haiku—showing users can't detect losses. This 'invisible inequality' arises because weaker agents fail to counter strong ones effectively, yet outcomes feel equitable.
Test via A/B: assign models randomly in agent markets, survey post-deal. Reveals how capability gaps compound without feedback, eroding trust if uncovered.
AI Agent Markets Risk Inequality Amplification
46% of participants would pay for such services, proving viability, but scale introduces perils: corporate incentives skew optimization; attention-grabbing dominates user benefit; jailbreaks/prompt injections threaten autonomous actions. Policy lags—frameworks needed for transactional AI to avoid reinforcing economic divides.
Prior experiments like Project Vend (Claude shop) confirm agents ship real value. Build with safeguards: equalize access to top models, audit agent interactions, regulate for transparency in mixed-capability markets.