Tag: deep-learning

Summaries

Diffusion: Data-Efficient Framework Outshining Autoregressives on Scarce Data

Caleb Writes Code

Apr 28, 2026

Diffusion: Data-Efficient Framework Outshining Autoregressives on Scarce Data

Diffusion is a training framework—not architecture—that creates extra samples by gradually noising clean data over 1,000 steps, outperforming autoregressives on 25-100M tokens where data is limited but compute abundant; lags in text due to slow inference and infrastructure.

machine-learning

Andrej Karpathy Blog

Apr 26, 2026

Karpathy's 200-Line Pure Python AI Builds

Train GPT, RNNs, RL Pong, and Bitcoin tx in pure Python with zero dependencies—distilling neural nets to essentials in under 200 lines.

machine-learning

DeepMind's Diffusion Model Training Secrets

AI Engineer

Apr 21, 2026

DeepMind's Diffusion Model Training Secrets

Sander from DeepMind reveals data curation trumps model tweaks, latent autoencoders enable scale, diffusion denoises via spectral autoregression for superior audiovisual generation.

machine-learning

Towards AI

Apr 21, 2026

PCL: Confidence RL for Dynamic LLM Environments

PCL algorithm integrates predictive confidence scores into LLM RL rewards via ensembles and blended token/sequence signals, enabling adaptation to nonstationary changes without retraining.

machine-learning

Generative AI

Apr 21, 2026

Sentences Define Word Meanings via Self-Attention

Transformers ended 30 years of sequential processing flaws by using self-attention, where every word weighs relevance from the entire sentence context, powering GPT and all modern LLMs.

machine-learning

LLM Inference: mmap Loading & Quantization Deep Dive

Caleb Writes Code

Apr 20, 2026

LLM Inference: mmap Loading & Quantization Deep Dive

Efficient LLM inference hinges on mmap for lazy memory loading (e.g., <10s startup on llama.cpp) and quantization like GGUF K-Quants or AWQ/EXL2 to shrink 15GB models while preserving quality via salient weights and mixed precision.

machine-learning

Andrej Karpathy Blog

Apr 20, 2026

Karpathy's Blog: Pure Python AI From Scratch

Andrej Karpathy distills neural nets into minimal Python code—200 lines for GPT training/inference—plus RL, RNNs, and human baselines on vision tasks.

machine-learning

Level Up Coding

Apr 20, 2026

Preprocessing Swings CNN Accuracy from 65% to 87% on CIFAR-10

Raw CIFAR-10 pixels yield 65% test accuracy; normalization/standardization lift to 69%; geometric augmentation maintains ~67%; photometric brightness/contrast crashes to 20%; combined pipeline with deeper CNN hits 87%.

machine-learning

DeepMind's AI Frontiers: Embeddings, Weather, Worlds

AI Engineer

Apr 19, 2026

DeepMind's AI Frontiers: Embeddings, Weather, Worlds

DeepMind pushes Gemini beyond LLMs with omnimodal embeddings for unified retrieval, weather models beating physics sims (GraphCast: 15-day forecasts; GenCast: 97% benchmark accuracy), and Genie world simulators for interactive 3D environments.

machine-learning

__oneoff__

Apr 19, 2026

LLM Architecture Gallery: Diagrams, Specs & Diffs for 70+ Models

Sebastian Raschka's gallery visualizes 70+ LLM architectures with diagrams, key specs like KV cache costs, attention types, and a diff tool—ideal for comparing dense vs. MoE designs and inference tradeoffs.

Towards AI

Apr 19, 2026

Attention Scores Are Kernel Evaluations via Mercer's Theorem

QK^T in attention computes kernel similarities between queries and keys; Mercer's theorem proves it's a valid positive semi-definite kernel, making softmax a mathematical necessity for normalization, not just architecture.

machine-learning

AI Simplified in Plain English

Apr 17, 2026

53x AI Efficiency via Model Distillation by 2025

Train small 'student' models on large 'teacher' models' soft probabilities—not just labels—to match performance while slashing size, speed, and costs by 53x by 2025.

machine-learning

MarkTechPost

Apr 16, 2026

Parcae Stabilizes Loops to Match 2x Transformer Quality

Parcae enforces looped transformer stability via negative diagonal matrices in a dynamical system, outperforming baselines and achieving 87.5% of a twice-sized Transformer's quality at half parameters.

machine-learning

MarkTechPost

Apr 13, 2026

Build FNO & PINN Surrogates for Darcy Flow with PhysicsNeMo

Step-by-step Colab guide: generate 2D Darcy datasets via GRF & finite differences, implement/train FNO operators and PINNs, add CNN baselines, benchmark inference speeds for fast physics surrogates.

machine-learning

Towards AI

Apr 8, 2026

Word2Vec: Turning Word Neighborhoods into Embeddings

Word2Vec learns dense word vectors by predicting local contexts with CBOW or Skip-gram, clustering similar words like 'cat' and 'dog' via repeated gradient updates from shared neighborhoods.

machine-learning

Andrej Karpathy Gists

Apr 8, 2026

Batch GEMMs for Fast LSTM in Torch

Fuse LSTM operations into nngraph module to batch 4 GEMMs, slashing overhead vs standard nn.LSTM (optimized by @jcjohnson).

machine-learning

Andrej Karpathy Gists

Apr 8, 2026

Batched L2 Norm Layer for Torch Neural Nets

Custom Torch nn.Module normalizes each row of n x d input tensor to unit L2 norm, with efficient batched forward/backward passes for training.

machine-learning

Andrej Karpathy Gists

Apr 8, 2026

Minimal NumPy RNN for Char-Level Text Gen

Build a vanilla RNN language model from scratch in ~170 lines of NumPy: processes text chunks of 25 chars, trains with BPTT and Adagrad, generates samples after 100 iterations.

machine-learning

Andrej Karpathy Gists

Apr 8, 2026

NumPy Batched LSTM Forward/Backward

Efficient pure NumPy LSTM processes batched sequences (n,b,input_size); init with Xavier + forget bias=3; verified via sequential match and numerical gradients.

machine-learning

Andrej Karpathy Gists

Apr 8, 2026

Policy Gradients for Pong: 100-Line RL Agent

Train a 2-layer NN to play Atari Pong from raw pixels using REINFORCE policy gradients. Uses 80x80 binary diff frames, discounts rewards with gamma=0.99, standardizes advantages, RMSProp updates every 10 episodes. Converges on CPU in hours.

machine-learning

Andrej Karpathy Blog

Apr 8, 2026

Karpathy's Pure Python AI From Scratch

Andrej Karpathy distills neural nets, LLMs, RL, and Bitcoin into 200-500 line pure Python scripts—no deps needed—to teach core mechanics hands-on.

machine-learning

Learning Data

Apr 8, 2026

Pause Before Trust: AI Fooled My Instincts

AI generates undetectable fakes that exploit human trust shortcuts—train yourself to pause and question realistic audio, video, or text instead of believing instantly.

TurboQuant: 6x KV Cache Compression Without Attention Loss

Reinike AI

Apr 7, 2026

TurboQuant: 6x KV Cache Compression Without Attention Loss

TurboQuant rotates KV vectors before quantizing to 3.5 bits/channel (quality-neutral) or 2.5 bits (minor degradation), plus error repair, yielding 6x memory savings and up to 8x speedups for long-context LLMs.

machine-learning

Dwarkesh Patel

Batch Size Math: Why LLM Inference Costs Plummet at Scale

Roofline analysis shows batching 2000+ tokens amortizes weight memory fetches, slashing per-token cost 1000x; fast modes use tiny batches for low latency at 6x price.

__oneoff__

DeepSeek-V3: 671B MoE Tops Benchmarks at $5.6M Cost

DeepSeek-V3, a 671B param MoE LLM (37B active per token), trained on 14.8T tokens using FP8 and optimized infra for 2.8M H800 GPU hours ($5.6M total), outperforms open-source models and rivals GPT-4o/Claude-3.5-Sonnet in code, math, and reasoning.

machine-learning

__oneoff__

Microsoft's Efficient 1-Bit LLMs and Multimodal AI Papers

Catalog of 70+ Microsoft papers on 1.58-bit LLMs for CPU inference, zero-shot TTS, long-context scaling to 1B tokens, and agentic reasoning via distillation and sparsity.

machine-learning

__oneoff__

PhysicsNeMo: NVIDIA's Framework for Physics-ML Models

PhysicsNeMo equips developers with an open-source PyTorch-based toolkit to build, train, and fine-tune deep learning models incorporating physics constraints, supporting 20+ pre-implemented architectures for weather, mechanics, and more.

machine-learning

__oneoff__

World Models Build AI's Internal Reality Simulators

World models train on experience streams to predict cause-and-effect dynamics, creating compact internal simulations for efficient planning and physics understanding—surpassing LLMs' token prediction.

machine-learning