GPT-5.5 Powers PhD Papers and RPGs from Few Prompts

Advances Across Models, Apps, and Harnesses Unlock Real Work

GPT-5.5 excels by integrating three layers: powerful models (GPT-5.5 Pro most competent), apps like desktop Codex (rivaling Claude Code for code execution), and harnesses with tools for computer control, research, coding, and a new image generator (GPT-imagegen-2) that renders high-quality text in images. This stack enables AIs to tackle decade-procrastinated tasks. For instance, only GPT-5.5 Pro built a true procedurally generated 3D harbor town simulation evolving from 3000 BCE to 3000 AD with user controls, unlike o3 (released a year ago) or open models like Kimi K2.6 that just swapped static buildings—completing in 20 minutes vs. GPT-5.4 Pro's 33. The image tool passes the Otter Test by depicting an otter on a plane using WiFi, generates formatted academic paper pages on desks, or fills art galleries with labeled otter-airplane images in styles of Klimt, Rothko, Matisse, Monet, Picasso, Titian, Rembrandt, and O'Keefe—impossible months ago, now powering slides, mockups, or websites.

Few Prompts Yield Production-Ready Outputs on Complex Data

Feed GPT-5.5-powered Codex hundreds of anonymized crowdfunding files (STATA, CSV, XLS, Word) with four prompts to sort data, hypothesize, test sophisticatedly (addressing causation), review literature, and format a paper. Result: a near-PhD-quality academic paper (real lit review, sound stats) critiquable only for hypothesis novelty, not errors—equivalent to a 2nd-year PhD output without human text edits. Iterating via GPT-5.5 Pro feedback refined it further. Similarly, one prompt to Codex created a full fantasy tabletop RPG (original world, rules drawing on D&D patterns with unique elements), simulated playtesting, revised rules, formatted a 101-page PDF, and illustrated it—producing playable content with novel setting but technically sound mechanics.

Jagged Frontier Persists Despite Accelerating Gains

Progress accelerates: a year ago, these feats were impossible; leaps grow per cycle, pushing the frontier outward. Yet AI struggles with long-form fiction—evident in RPG text via uncanny vibes, unpaying complex ideas, weird metaphors (e.g., "weather and architecture are the same argument at different speeds"), ornate repetition, uniform clipped dialogue, and overused names like "Mara." Hypotheses can lack spark despite rigorous stats. This signals ongoing rapid improvement, not endpoint—test via author's gallery of all models on the 3D sim.