4 Concepts Unlock How LLMs Actually Work

Tokenization Breaks Text into Predictable Chunks for Processing

LLMs don't process letters or full words; they split input into tokens, typically 3-4 characters each. For example, "The quick brown fox jumped" becomes chunks like "The", " quick", " brown", " fox", " jump" + "ed". This lets models recognize patterns such as "-ed" signaling past tense, enabling efficient handling of vast data without memorizing every word. Use this to debug unexpected outputs: check token splits via tools like Claude's tokenizer to spot why rare words fragment oddly, improving prompt accuracy.

Training Compresses World Knowledge Without Explicit Programming

LLMs learn by ingesting billions of pages—books, websites, code—building compressed representations of patterns, not rote facts. Picture a library resident who's read everything but never left: they infer "water flows downhill" or "functions need return values" from correlations. Training unfolds in four stages (pretraining on raw data, supervised fine-tuning, reinforcement learning from human feedback, and alignment), turning noise into probabilistic predictions. This explains why LLMs autocomplete like supercharged phone keyboards: they predict next tokens based on statistical likelihoods from training data, excelling at essays, math, or code but failing on novel logic outside learned patterns.

Context and Temperature Control Runtime Behavior

Each interaction uses a context window as working memory—a whiteboard holding your prompt, files, and prior exchanges until full (older content erases first). Typical limits (e.g., Claude's 200K tokens) mean summarizing long histories to fit, preventing overload and hallucinations from forgotten details. Generation applies temperature (0-1 scale): low (0.0-0.3) yields predictable, factual outputs; high (0.7-1.0) boosts creativity via diverse token sampling, like weighted dice rolls. Key tactic: query identically 5x at low temp for consistent data, then evaluate ensembles—single responses are anecdotes, multiples reveal quality, elevating you from casual to pro user.

Five Hard Limits Define LLM Boundaries

LLMs can't: 1) access real-time internet (no browsing mid-chat); 2) retain persistent memory (chats reset); 3) perform physical actions (no emails or clicks); 4) know post-training events (provide updates manually); 5) learn from user corrections (one chat doesn't train the model). These force workarounds like external tools for web data or databases for memory, ensuring reliable production use.

Tokenization Breaks Text into Predictable Chunks for Processing

Training Compresses World Knowledge Without Explicit Programming

Context and Temperature Control Runtime Behavior

Five Hard Limits Define LLM Boundaries

More on Edge

MEMENTO: LLM Self-Notes Slash KV Cache 3x

AI Homunculus: Superintelligence Reshapes Everything Fast

LLMs Fake Competence More Dangerously Than They Hallucinate

LLMs Mimic Wisdom Without True Thought or Experience