AI Glossary: Master Terms for Building with LLMs

Core AI Architectures Powering Modern Tools

Large language models (LLMs) underpin assistants like ChatGPT, Claude, Gemini, Llama, Copilot, and Le Chat. These deep neural networks, with billions of parameters (weights), map word relationships from vast datasets of books, articles, and transcripts. When prompted, they predict the most likely next tokens. Neural networks form their backbone: multi-layered structures mimicking brain neurons, enabling deep learning to auto-discover data features without manual engineering. Deep learning needs millions+ data points and extended training, driving high costs but yielding complex correlations beyond simple ML like decision trees.

AGI remains vague: Sam Altman calls it a 'median human co-worker'; OpenAI's charter defines it as autonomous systems outperforming humans in most economically valuable work; DeepMind sees it as matching humans on cognitive tasks. Even experts disagree, so prioritize narrow capabilities over chasing AGI hype when building.

Training, Optimization, and Deployment Trade-offs

Distillation transfers knowledge from a large 'teacher' model to a smaller 'student' by recording outputs and retraining—creating efficient versions like GPT-4 Turbo. It risks ToS violations if distilling competitors' APIs. Fine-tuning adapts pre-trained LLMs with domain-specific data for targeted tasks, letting startups specialize general models.

Inference runs trained models to generate predictions; it demands optimized hardware (GPUs, TPUs) as large models crawl on laptops. Memory cache like KV caching speeds this in transformers by reusing computations, slashing power and latency for repeated queries. Compute denotes the GPUs/CPUs fueling training/inference—the AI economy's bottleneck.

Hallucinations occur when LLMs fabricate facts from training gaps, risking misinformation (e.g., bad medical advice). Mitigate with domain-specific fine-tuning to close knowledge holes.

Generation Techniques and Reasoning Boosts

Diffusion models generate art/music/text by learning to reverse 'noise destruction' of data, enabling realistic outputs from randomness. GANs pit generator vs. discriminator networks to refine fakes like deepfakes, best for narrow tasks like images/videos.

Chain-of-thought prompting breaks problems into steps (e.g., legs/heads riddle: 20 chickens, 20 cows), improving LLM accuracy on logic/coding via reasoning models optimized with reinforcement learning. This trades speed for reliability.

Agents Unlock Autonomous Workflows

AI agents chain LLMs with tools for multi-step tasks like booking or expense filing, using API endpoints as 'buttons' to control services autonomously. Coding agents extend this to dev workflows: writing, testing, debugging, and fixing code across repos—like tireless interns needing review.

Infrastructure lags, but agents amplify automation; pair with RAG (not detailed here) to ground outputs and curb hallucinations.

Core AI Architectures Powering Modern Tools

Training, Optimization, and Deployment Trade-offs

Generation Techniques and Reasoning Boosts

Agents Unlock Autonomous Workflows

More from AI & LLMs

Wrap Existing Chat Agents in Voice with ElevenLabs Engine

Claude Dreaming Boosts Agents 5.4x on Repeat Tasks

Run Gemma 4 Agents On-Device with LiteRT Stack

Mistral Vibe Remote Agents Run Coding Tasks in Cloud at 77.6% SWE-Bench