Eve Bodnia: EBMs Fix What LLMs Can't for Critical Tasks

LLMs' Fatal Flaws for Mission-Critical Systems

Eve Bodnia argues that transformer-based LLMs, dominant in AI today, are fundamentally unreliable for high-stakes applications like chip design, financial analysis, or aviation controls. Their autoregressive nature—generating output token-by-token without mid-process inspection—leads to hallucinations, where the model commits to errors without correction. "Imagine there's AI driving a car and you're in that car and that car is an LLM and someone tells you like, you know, 20% of the time it's going to hallucinate and you might end up like in in like a wrong place," Bodnia warns, contrasting Dan Shipper's more experimental curiosity about such risks.

LLMs act as black boxes: you can't peek inside during generation to assess confidence or reasoning. Even with external verifiers like Lean 4—a machine-verifiable proof language—attached post-generation, the core issue persists. Token prediction remains a costly "guessing game," expensive in compute and unreliable for determinism. Shipper pushes back, noting LLMs excel at generating useful output verifiable via tests, but Bodnia counters that this "guess and check" is inefficient and doesn't guarantee internals align with outputs.

Mission-critical industries haven't widely adopted LLMs precisely because of this gap. Bodnia sees Logical Intelligence filling it by prioritizing "deterministic AI, verifiable AI," starting with software/hardware correctness.

Energy-Based Models: Physics-Inspired Alternatives

Bodnia's solution is energy-based models (EBMs), rooted in physics' energy minimization principle—think Lagrangians deriving equations of motion from kinetic and potential energy terms. EBMs are non-autoregressive and token-free, mapping all possible outcomes onto an "energy landscape": probable states settle in low-energy "valleys," improbable ones on high-energy "peaks."

Unlike LLMs' sequential navigation (like a left-brain pathfinder taking wrong turns without backtracking), EBMs survey the entire map upfront. "EBM going to have the first view all the time. So if you see there's a hole, you're going to choose a different route," Bodnia explains with a navigation metaphor. Her team's model, dubbed Kona (energy-based reasoning model with latent variables), constructs these landscapes from data, enabling real-time inspection and self-alignment during training.

Shipper tests the concept: modeling his post-podcast behavior (ending on the couch). An LLM might predict via token probabilities from vast text data, but EBMs directly map observed states (tiredness, house geometry) to the landscape without language mediation. This yields inspectable confidence scores pre-output, plus external verifiers for double assurance.

EBMs are cheaper—no tokens mean no guessing compute—and controllable: "You control the training. It's no longer black box for you." Bodnia envisions hybrid use: prototype on LLMs, plug in EBMs for production.

Beyond Language: True Data Understanding

A core critique: LLMs force all intelligence through language, distorting non-verbal tasks. Human reasoning is abstract, multilingual, and language-independent; LLMs' token chains vary by training language, yielding inconsistent processes. Driving a car or navigating a house relies on visual-spatial data, not word prediction—yet LLMs embed it into language space first.

"Intelligence which is language-dependent... feels really wrong," Bodnia asserts. "When you drive a car, when you walk around your house, how much language you actually use? Are you trying to predict next word...? Probably not."

EBMs process raw data modally, constructing landscapes that reveal underlying "laws" (e.g., conservation principles). Shipper suggests sequence modeling via movement tokens; Bodnia agrees it's viable but unnecessary—EBMs handle it natively, without language crutches.

This enables "understanding" as structural insight, not statistical correlation. Observing Shipper repeatedly, an EBM learns his "equation of motion": tired → couch (lowest valley), gym as secondary low point.

Verifiable Code from Plain English

EBMs tackle "vibe coding"—LLM-generated code that feels right but fails scrutiny. By enabling formal verification in plain English (no C++ needed), they produce certifiably correct outputs. Internal verifiers assess solution quality mid-process; landscapes quantify confidence.

Logical Intelligence targets code gen and chip design, where LLMs falter. Bodnia predicts EBMs bridge the adoption gap in banking, aviation, and beyond, automating without risk.

Signs of LLM Plateau and EBM Momentum

Bodnia observes LLM progress stalling: scaling laws yield diminishing returns as language ceilings hit. Non-language tasks expose limits; mission-critical sectors demand alternatives.

"LLM progress is plateauing," she states at 00:43:21 timestamp context. EBMs, inspectable and efficient, position Logical Intelligence as a foundational player. Shipper probes trade-offs, but Bodnia emphasizes EBMs' universality for verifiable AI everywhere.

Key Takeaways

Prioritize internal verifiers in AI architecture for mission-critical tasks; LLMs' black-box token generation can't self-correct hallucinations.
Build energy landscapes to model data: map states to valleys/peaks for probabilistic navigation without sequences.
Ditch language dependency—process visual/spatial data natively to avoid embedding distortions in non-verbal reasoning.
Combine EBM self-alignment with external tools like Lean 4 for double verification, slashing compute costs.
Prototype on LLMs, deploy EBMs: hybrids accelerate verifiable code gen and chip design from plain English.
Watch LLM scaling plateau; physics-based models like EBMs unlock deterministic AI for aviation, finance, and automation.
Inspect models in real-time during training to control outcomes—EBMs make AI transparent, not a post-hoc guess.
For behavior prediction (e.g., post-work routines), observe states directly; energy minimization reveals 'laws' like tired → relax.

LLMs' Fatal Flaws for Mission-Critical Systems

Energy-Based Models: Physics-Inspired Alternatives

Beyond Language: True Data Understanding

Verifiable Code from Plain English

Signs of LLM Plateau and EBM Momentum

Key Takeaways

More on Edge

DeepSeek's Visual Primitives: 10x KV Cache Efficiency

Gemma 4: Efficient Architectures Power Top Small Open Models

GPUs Crush AI Tasks with Parallel Compute and Vast Memory

GPUs Power AI with Parallel Compute and Massive Memory