AI Intelligence: Compression Over Scale
True intelligence compresses data into minimal algorithmic rules via MDL, not memorizes petabytes. A 76k-parameter model solves 20% of ARC puzzles at inference, outpacing trillion-parameter LLMs through neuro-symbolic code generation.
Scale Fails Where Compression Succeeds
Current trillion-parameter LLMs memorize internet-scale data but fail novel reasoning tasks like ARC puzzles, scoring near zero while humans hit ~90% via hypothesis generation and backtracking. They interpolate training data (Manifold Hypothesis) but hallucinate on out-of-distribution problems, acting as 'stochastic parrots' (Brown et al., 2020). Chollet's intelligence formula—skill / (data × compute)—exposes their inefficiency: planetary data and server farms for basic concepts.
Minimum Description Length (MDL) redefines intelligence as the shortest program explaining data, like Occam's Razor for code. CompressARC proves it: a zero-pretrained 76,000-parameter model solves 20% of ARC at inference by searching compressed algorithmic states, disrupting brute-force trends (Liao & Gu, 2025). Build reasoning agents prioritizing sample efficiency—needing millions of examples signals a database, not intelligence.
Neuro-Symbolic Shift: LLM + Code for Verifiable Reasoning
Epochs evolved from rigid symbolic AI (combinatorial explosion, Ellis et al., 2021) to flawed text prompting (LLMs destroy geometry, Moskvichev et al., 2023). Now, ARC-AGI-3 uses Kahneman's dual-process: System 1 LLM generates Python hypotheses; System 2 interpreter executes, debugs via loops (Gao et al., 2023). Code output enables static analysis, theorem provers (Z3), and auditability—safer than natural language for enterprises.
Active inference (o1, DeepSeek-R1) adds iterative search: synthesize code, run, analyze diffs, self-improve. Tool orchestration (ViperGPT) routes to external verifiers. LARC shows ARC logic translates to text, making LLMs 'General Pattern Machines' (Acquaviva et al., 2022). AlphaCode enforces modular structure, boosting reasoning (Li et al., 2022). A 1.5B-parameter distilled model crushes 13B baselines via test-time logic (Anjum, 2025).
Trade-offs and Democratization Path
Test-time compute explodes inference costs with thousands of scripts, risks infinite loops, and sparks benchmark races (ARC-AGI-3 interactive environments). Yet Program-Aided Distillation (PaD) transfers trajectories to small open-source models, enabling local System-2 AI, bypassing copyright via native synthesis, and ensuring auditability (Zhu et al., 2024). Pivot to neuro-symbolic agents over oracles for safe, efficient AGI.