Matrix Compression Guarantees Information Loss in Every Layer

Neural network layers perform one core operation: multiplying inputs by a weight matrix. When output dimensions are fewer than input dimensions—as is standard to reduce parameters and compute—a compression occurs. For a 3×2 matrix example, this maps 3D inputs to 2D outputs, permanently discarding one dimension of information. The Rank-Nullity Theorem (proved 1884) quantifies this: rank (dimension of image) + nullity (dimension of null space) = input dimension. Here, rank ≤ 2, so nullity ≥ 1, meaning at least one direction of input variation is erased. Verify by hand: for matrix A (3×2), find non-zero vector x where Ax = 0; differences along x become indistinguishable post-multiplication.

Null Space Directly Causes Hallucinations

Hallucinations arise when true and false facts differ only in the null space. The model cannot distinguish them because the layer mapping collapses those differences to zero. Not a training flaw or 'stupidity'—the linear algebra forbids it. In the 3×2 case, inputs varying in the null space direction produce identical outputs, so the network 'genuinely cannot tell' fact from fiction. This holds for every layer, compounding across the network: multi-layer compression amplifies blind spots.

Implications: Hallucination Cannot Be Eliminated, Only Managed

Since compression is baked into architecture for efficiency, zero hallucinations defy math. Instead, geometry guides mitigation: expand dimensions to shrink nullity (but explodes compute/cost), or align prompts/data away from known null spaces via RAG/fine-tuning. The proof fits on a napkin—compute your own 3×2 matrix to see null space explicitly. This shifts focus from 'fixing' to engineering around inevitable losses.