Building Interoperable Standards for Advanced AI Systems

The Need for a Shared Technical Language

As AI models increase in capability, the industry faces a critical gap: the lack of a standardized, interoperable "trust layer" that allows third parties, governments, and organizations to verify safety claims. Current safety frameworks often exist in silos, making it difficult for international institutions to compare risk findings or validate conformity across different parts of the AI value chain.

To address this, OpenAI helped launch the Appia Foundation, hosted by the Linux Foundation. Appia’s primary objective is to develop open, modular specifications that translate abstract international standards into practical, reusable assessment criteria. By creating a shared technical language, Appia aims to enable national AI safety institutes and independent auditors to trust and verify each other's work, ensuring that safety evidence is consistent regardless of which organization developed the model or infrastructure.

Operationalizing Safety Through Standardized Evals

Effective governance requires moving beyond broad policy commitments toward rigorous, transparent, and reproducible evaluation practices. OpenAI emphasizes that frontier assessments must disclose specific technical details to be considered credible, including:

System Configuration: The specific model version and architecture tested.
Tooling & Harnesses: The evaluation environment, including tool access and the specific harness used to elicit capabilities.
Methodology: The resources allocated and the specific checks performed to validate results.

These practices are currently being refined through partnerships with the U.S. Center for AI Standards and Innovation (CAISI) and the U.K. AI Safety Institute (AISI). These collaborations serve as a blueprint for how technical rigor can be standardized, allowing for comparable performance checks across different jurisdictions. These efforts are designed to complement internal governance artifacts, such as OpenAI's Preparedness Framework and Frontier Governance Framework, by making internal safety practices interoperable and auditable by external stakeholders.

The Need for a Shared Technical Language

Operationalizing Safety Through Standardized Evals

More from Evals & Reliability

Debugging Production AI Agents via Record and Replay

The Promptware Kill Chain: Understanding AI Malware

Human Rights Experts Must Influence Age Assurance Standards

Optimizing Software Workflows with AI Code Review