Preparedness Framework
What this is
OpenAI's Preparedness Framework (Beta), dated December 18, 2023, is a "living document" describing OpenAI's processes to track, evaluate, forecast, and protect against catastrophic risks from increasingly powerful models. It is a governance and evaluation framework, not a model card for a specific release. The document covers four tracked risk categories — cybersecurity, CBRN, persuasion, and model autonomy — with a graduated Low/Medium/High/Critical scale for each. Its central thesis is that "a robust approach to AI catastrophic risk safety requires proactive, science-based determinations of when and how it is safe to proceed with development and deployment."
Capabilities
This document does not describe a specific model's capabilities, benchmarks, modalities, or context window. It is a process and governance framework applicable to frontier model development broadly.
Evaluation methodology
Evaluations are run on both pre-mitigation models — base models with optimized prompts plus fine-tuned versions targeting specific misuse vectors — and post-mitigation models, to capture "worst known case" scenarios. Evaluations are conducted continuously, including before, during, and after training, and are triggered by any ">2x effective compute increase or major algorithmic breakthrough." An illustrative scorecard template is included but the document states all specifics are "purely for illustrative purposes and do not reflect the results from real evaluations." Third-party audits by "qualified, independent third-parties" are planned to verify scorecard findings by reproducing results or reviewing methodology.
Safety testing
Four risk categories are tracked: cybersecurity (exploit development and autonomous cyber operations), CBRN (model-assisted creation of chemical, biological, radiological, or nuclear threats), persuasion (commoditized influence content), and model autonomy (self-replication, self-exfiltration, autonomous AI R&D). OpenAI acknowledges that "the empirical study of catastrophic risk from frontier AI models is nascent" and that current threshold estimates across the Medium-to-Critical range are "speculative." The illustrative scorecard places cybersecurity and CBRN at Low/Low and persuasion and model autonomy at Medium/Low (pre/post-mitigation), but these numbers are placeholders, not real evaluation outputs. An "unknown unknowns" workstream runs in parallel to identify emerging risk categories not yet captured.
Mitigations
Safety baselines are tied to scorecard thresholds: only models with a post-mitigation score of "medium" or below may be deployed, and only models with a post-mitigation score of "high" or below may continue development. If pre-mitigation risk reaches or is forecast to reach "high" in any category, security must be hardened to prevent model exfiltration before deployment proceeds. Specific mitigation options include increasing compartmentalization, restricting deployment to trusted users, implementing refusals, redacting training data, and alerting distribution partners. For "critical" pre-mitigation risk, OpenAI requires "dependable evidence that the model is sufficiently aligned that it does not initiate 'critical'-risk-level tasks unless explicitly instructed to do so."
Deployment and access
The framework does not announce a specific model release or license. External researchers and government actors are granted access to frontier model releases to deepen red-teaming and capability testing. Scorecard evaluations are audited by independent third parties at a cadence set by the Safety Advisory Group or on request from OpenAI leadership or the Board of Directors.
Limitations
OpenAI states that current risk-level threshold estimates for "medium" through "critical" are "speculative and will keep being refined as informed by future research." The tracked risk categories are acknowledged as "almost certainly not exhaustive," motivating the standing unknown-unknowns workstream. No specific model performance data is disclosed in this document; all scorecard values are illustrative placeholders.
What's new
This is the initial Beta release of the Preparedness Framework, dated December 18, 2023. The document describes itself as a "living document" intended for frequent updates as evaluations and research develop. No prior version is referenced, so no version delta is available.