Neuro-Symbolic Drive: Grounding VLA Reasoning in Classical Logic

Bridging the Gap Between Reasoning and Motion

Vision-Language-Action (VLA) models often struggle with "hallucinated" reasoning, where the natural language Chain-of-Thought (CoT) is not causally linked to the actual vehicle trajectory. Neuro-Symbolic Drive addresses this by grounding the VLA's reasoning in the logic of classical rule-based planners. Rather than relying on post-hoc alignment, the framework uses the internal decision traces of symbolic planners—which inherently handle safety constraints and maneuver selection—as the ground truth for the VLA's reasoning process.

Converting Symbolic Logic into Structured Supervision

The core innovation is the instrumentation of classical planners within a simulation environment. As the planner evaluates rules to select a trajectory, the system captures the specific rule-evaluation steps and serializes them into structured reasoning traces. These traces are then paired with the executed trajectories to fine-tune the Qwen3.5-4B model. By training the VLA on these traces, the model learns to generate reasoning that is structurally coupled to its motion output by construction.

Performance Gains in Autonomous Driving

This approach significantly improves driving performance on simulator-generated benchmarks. Under three-camera perception, the framework reduced the Average Displacement Error (ADE@3s) from 0.47 to 0.26 and the miss rate from 8.30% to 6.40%. With eight-camera perception, the ADE@3s dropped from 0.54 to 0.26, and the miss rate improved from 10.13% to 5.99%. These results demonstrate that providing VLAs with rule-grounded reasoning effectively translates symbolic planning logic into more reliable, faithful motion generation.

Bridging the Gap Between Reasoning and Motion

Converting Symbolic Logic into Structured Supervision

Performance Gains in Autonomous Driving

More from AI & LLMs

SEAGym: A Benchmark for Self-Evolving LLM Agents

MemToolAgent: Improving Agent Reliability Through Reflective Memory

Parallel Context Compaction for Long-Horizon LLM Agent Serving

DecisionBench: Measuring Agentic Delegation in Long-Horizon Tasks