Karpathy's Pure Python AI From Scratch

Minimal Code for Core AI Models

Train and run a full GPT in just 200 lines of dependency-free Python, covering tokenization, model architecture, training loop, and sampling—proving LLMs are accessible without frameworks. Similarly, implement deep RL to master Atari Pong from raw pixels using policy gradients, weighing pros (sample efficiency) against cons (high variance). Character-level RNNs generate poetry, LaTeX, and code; analyze gradients to spot future directions like better optimization. Fool ImageNet classifiers with tiny perturbations, showing even linear models (not just convnets) break easily, challenging robustness claims.

Historical Benchmarks and Progress

Revisit LeCun's 1989 backprop-trained neural net—the first real-world end-to-end DL app—then upgrade it with 33 years of advances (e.g., modern optimizers, architectures) to quantify progress; preview how 2022 DL will age by 2055. Humans hit 6.7% error@5 on ImageNet vs. top convnets, but manual CIFAR-10 labeling reveals human baselines aren't unbeatable. Early CV state (2012) lags far behind human vision, tempering AI hype.

Practical Training and Experiments

Follow a battle-tested recipe for neural nets: batch size 0.2-10% of GPU memory, weak regularization first, then strengthen; cosine anneal LR over 1M steps. Scrape 2M selfies to train convnets classifying good/bad #selfies, visualizing what networks 'think'. Track productivity via window/keystroke logging on Ubuntu/OSX, generating HTML viz for insights. Biohacking basics: tweak energy metabolism via experiments. PhD survival: navigate academia with tips on focus, advising.