Mask-Proof: Automated Data Curation for Mathematical Proofs

The Challenge of Mathematical Data Curation

Mathematical reasoning remains a frontier for Large Language Models (LLMs), largely due to the difficulty of sourcing high-quality, verified proof data. Traditional datasets often suffer from noise, lack of formal structure, or insufficient verification, which hinders the model's ability to perform complex logical derivations. The Mask-Proof pipeline addresses this by providing a systematic, LLM-driven approach to curate and refine mathematical proofs at scale.

The Mask-Proof Pipeline Architecture

Mask-Proof functions as an automated data curation framework that leverages the reasoning capabilities of LLMs to filter, verify, and structure mathematical content. By implementing a multi-stage pipeline, the system identifies potentially valid proofs, masks critical logical steps to test the model's internal reasoning, and validates the output against formal or semi-formal constraints. This process effectively converts raw, unstructured mathematical text into high-fidelity training data that is better suited for fine-tuning models in domain-specific reasoning tasks.

Impact on Model Reasoning

By automating the curation process, Mask-Proof reduces the reliance on manual data labeling, which is both expensive and prone to human error. The pipeline's ability to generate 'masked' versions of proofs forces models to reconstruct logical steps, serving as a form of self-supervised learning that improves the model's grasp of mathematical syntax and logical flow. This approach is particularly effective for scaling up datasets for training models that require rigorous adherence to mathematical axioms and deductive consistency.

The Challenge of Mathematical Data Curation

The Mask-Proof Pipeline Architecture

Impact on Model Reasoning

More from AI & LLMs

LivingArena: Scaling LLM Evaluation via Peer-Probing

RoCo-ACE: Improving Knowledge Retention in Online LLM Distillation

CaRE: A Compute-Aware Evaluation Protocol for Masked Diffusion Models

Mechanistic Auditing via Reference Feature Atlases