#pretraining
Every summary, chronological. Filter by category, tag, or source from the rail.
Tag · #pretraining
AI Training Pitfalls: Distillation, Failures, Scaling Insights
Frontier labs can't easily stop cheap distillation ($25M for 1T tokens); pretraining fails via causality breaks (expert choice, token dropping) and FP16 biases; FSDP scales until comms bottleneck, then add pipeline; Pipeline RL fixes variable-length RL stragglers.
Dwarkesh Patel
Showing 1 of 1