Verifying LLM Reasoning Traces with VeryTrace

Formalizing Reasoning for Verification

Chain-of-Thought (CoT) prompting often fails because logical errors in early reasoning steps propagate, leading to incorrect conclusions. VeryTrace addresses this by transforming natural language reasoning traces into a structured, compilable Domain-Specific Language (DSL). This formalism forces the model to make step dependencies explicit, converts quantitative content into executable expressions, and organizes semantic inferences into formal deduction schemas. By moving from unstructured text to a structured representation, the system gains the ability to treat reasoning as a program that can be audited.

Hybrid Verification and Repair

VeryTrace employs a two-pronged verification strategy to identify and fix errors:

Deterministic Checks: The system performs automated verification for computational correctness, dependency resolution, and constraint satisfaction. Because the trace is now a structured format, these checks can be executed programmatically.
Targeted LLM Audits: For non-mechanizable semantic judgments that cannot be solved via deterministic code, the framework uses targeted LLM calls to verify the validity of specific reasoning steps.

This hybrid approach allows the system to localize errors to specific steps within the chain. Once an error is identified, the framework facilitates targeted repair, preventing the silent propagation of hallucinations or logical fallacies. The system achieves these performance gains across diverse domains—including competition mathematics, robotics planning, and kinship reasoning—without requiring additional training or in-context examples, proving that structured verification is a viable path toward more robust LLM reasoning.

Formalizing Reasoning for Verification

Hybrid Verification and Repair

More from AI & LLMs

Optimizing Long-Horizon AI Agents via Context Engineering

LoRA Fine-Tuning Builds Jailbreak-Proof LLM Agents

Neuro-Symbolic Drive: Grounding VLA Reasoning in Classical Logic

SEAGym: A Benchmark for Self-Evolving LLM Agents