№ 02 / SUMMARIES

#pretraining

Every summary, chronological. Filter by category, tag, or source from the rail.

Tag · #pretraining

DAY 01April 20, 2026 APR 20 · 20261 SUMMARIES

Dwarkesh PatelAI & LLMsApr 20, 2026

AI Training Pitfalls: Distillation, Failures, Scaling Insights

Frontier labs can't easily stop cheap distillation ($25M for 1T tokens); pretraining fails via causality breaks (expert choice, token dropping) and FP16 biases; FSDP scales until comms bottleneck, then add pipeline; Pipeline RL fixes variable-length RL stragglers.

Dwarkesh Patel

Showing 1 of 1