The Shift to Harness-Centric AI

Probably, a startup that recently raised $9 million, is challenging the industry trend of relying solely on massive frontier models to solve reliability issues. Founder Peter Elias argues that the key to achieving 99.99% accuracy in precision-sensitive tasks—such as data science, accounting, and medicine—is not just better models, but better "harness engineering."

Reducing Ambiguity with Deterministic Validators

The company’s approach involves wrapping LLMs in a "data science mech suit." Instead of relying on the model's internal probability to get an answer right, the system uses a deterministic validator that checks the LLM's output against the source dataset. If the output fails the validation, it is rejected. By training the LLM specifically against this validator, the system effectively reduces the ambiguity the model must navigate.

Efficiency Through Smaller Models

A core insight from this approach is that superior harness engineering allows for the use of significantly smaller AI models. By refining the context and constraints, the model does not need to perform complex reasoning, allowing it to run on local hardware rather than expensive data centers. This strategy directly addresses rising token costs and provides a path to production-ready AI that is both cheaper to operate and more reliable than current frontier-model implementations.