Bridging the Gap Between Reasoning and Motion

Vision-Language-Action (VLA) models often struggle with "hallucinated" reasoning, where the natural language Chain-of-Thought (CoT) is not causally linked to the actual vehicle trajectory. Neuro-Symbolic Drive addresses this by grounding the VLA's reasoning in the logic of classical rule-based planners. Rather than relying on post-hoc alignment, the framework uses the internal decision traces of symbolic planners—which inherently handle safety constraints and maneuver selection—as the ground truth for the VLA's reasoning process.

Converting Symbolic Logic into Structured Supervision

The core innovation is the instrumentation of classical planners within a simulation environment. As the planner evaluates rules to select a trajectory, the system captures the specific rule-evaluation steps and serializes them into structured reasoning traces. These traces are then paired with the executed trajectories to fine-tune the Qwen3.5-4B model. By training the VLA on these traces, the model learns to generate reasoning that is structurally coupled to its motion output by construction.

Performance Gains in Autonomous Driving

This approach significantly improves driving performance on simulator-generated benchmarks. Under three-camera perception, the framework reduced the Average Displacement Error (ADE@3s) from 0.47 to 0.26 and the miss rate from 8.30% to 6.40%. With eight-camera perception, the ADE@3s dropped from 0.54 to 0.26, and the miss rate improved from 10.13% to 5.99%. These results demonstrate that providing VLAs with rule-grounded reasoning effectively translates symbolic planning logic into more reliable, faithful motion generation.