Tokenization Breaks Text into Predictable Chunks for Processing
LLMs don't process letters or full words; they split input into tokens, typically 3-4 characters each. For example, "The quick brown fox jumped" becomes chunks like "The", " quick", " brown", " fox", " jump" + "ed". This lets models recognize patterns such as "-ed" signaling past tense, enabling efficient handling of vast data without memorizing every word. Use this to debug unexpected outputs: check token splits via tools like Claude's tokenizer to spot why rare words fragment oddly, improving prompt accuracy.
Training Compresses World Knowledge Without Explicit Programming
LLMs learn by ingesting billions of pages—books, websites, code—building compressed representations of patterns, not rote facts. Picture a library resident who's read everything but never left: they infer "water flows downhill" or "functions need return values" from correlations. Training unfolds in four stages (pretraining on raw data, supervised fine-tuning, reinforcement learning from human feedback, and alignment), turning noise into probabilistic predictions. This explains why LLMs autocomplete like supercharged phone keyboards: they predict next tokens based on statistical likelihoods from training data, excelling at essays, math, or code but failing on novel logic outside learned patterns.
Context and Temperature Control Runtime Behavior
Each interaction uses a context window as working memory—a whiteboard holding your prompt, files, and prior exchanges until full (older content erases first). Typical limits (e.g., Claude's 200K tokens) mean summarizing long histories to fit, preventing overload and hallucinations from forgotten details. Generation applies temperature (0-1 scale): low (0.0-0.3) yields predictable, factual outputs; high (0.7-1.0) boosts creativity via diverse token sampling, like weighted dice rolls. Key tactic: query identically 5x at low temp for consistent data, then evaluate ensembles—single responses are anecdotes, multiples reveal quality, elevating you from casual to pro user.
Five Hard Limits Define LLM Boundaries
LLMs can't: 1) access real-time internet (no browsing mid-chat); 2) retain persistent memory (chats reset); 3) perform physical actions (no emails or clicks); 4) know post-training events (provide updates manually); 5) learn from user corrections (one chat doesn't train the model). These force workarounds like external tools for web data or databases for memory, ensuring reliable production use.