AI Coding Tools Cut Learning 17% Unless You Probe 'Why'
Anthropic study: Developers learning new Python library with GPT-4o scored 17% worse (50% vs 65%) than docs-only group. Asking AI 'why' or for explanations preserves learning; pure delegation tanks it to 39%. No time savings for novel tasks.
AI Interaction Patterns Dictate Learning Outcomes
In a study of 52 Python-experienced developers learning the unfamiliar Trio library, GPT-4o users averaged 50% on concept quizzes versus 65% for the documentation-only control group—a 17% drop. Task completion times were similar (23 vs 25 minutes), showing no productivity edge. Screen recordings revealed six patterns:
Poor learning (24-39% scores):
- Full delegation: Fastest at 19.5 minutes but 39% score—AI handles everything.
- Progressive reliance: Starts solo, shifts to AI copy-paste (22 min, 35%).
- Iterative debugging: Repeatedly asks AI to fix errors without comprehension (31 min, 24%).
**Strong learning (65-86% scores):**n- Conceptual inquiry only: 22 min, 65%—uses AI for ideas, not code.
- Hybrid code + explanation: Requests rationale with code (24 min, 68%).
- Generation then comprehension: AI generates code, user asks targeted follow-ups (24 min, 86%).
Key: Cognitive engagement via questions preserves understanding; passive use erodes it.
No Speed Gains for Novel Tasks, Chatting Overhead Hurts
AI didn't accelerate learning new skills, unlike prior studies on familiar tasks. Only 20% of AI users stuck to code generation and finished faster than controls—but scored worst. Others lost time on prompts (up to 11 minutes chatting) and extra queries (1-15 per participant). Productivity shines for repetition, not first-time concept mastery.
Control group encountered 3x more errors (e.g., TypeError, RuntimeWarning on coroutines), forcing deeper debugging. This 'painful stuckness' built intuition, especially for quiz debugging questions where gaps were largest.
Preserve Skills: Use AI to Amplify, Not Replace Effort
Aggressive AI in engineering risks atrophying competence, critical for safety apps where humans must audit code. Adopt thoughtfully: Limit to conceptual queries or explanations to maintain gains without sacrificing speed. Agentic tools like Claude Code (less human input) may worsen effects. Study caveat: One-hour chat interface; real workflows vary, but cognitive effort remains key.