Model Cards / OpenAI

o3-mini System Card

model card12,215 words·53 min read·Mar 31, 2026·Source
Summary

o3-mini System Card

A 734-word brief of a 12,215-word document. Published by OpenAI. Version dated Mar 31, 2026.
01

What this is

OpenAI o3-mini is a reasoning model released by OpenAI on January 31, 2025, trained with large-scale reinforcement learning to perform chain-of-thought reasoning before responding. It is the latest in the o-series, positioned as a faster counterpart to o1-mini with particular strength in coding tasks. It employs deliberative alignment, a training approach that teaches the model to reason explicitly through safety specifications prior to producing an answer.

02

Capabilities

o3-mini achieves 61% pass@1 on SWE-bench Verified using an internal tool scaffold, and 92% pass@1 on OpenAI Research Engineer interview coding tasks. On MMLU translated into 14 languages, scores range from 0.62 (Yoruba) to 0.83 (Portuguese, Spanish, Italian), representing meaningful multilingual improvement over o1-mini. On the StrongReject jailbreak benchmark, o3-mini scores 0.73 (goodness@0.1), compared to 0.72 for o1-mini and 0.37 for GPT-4o. Context window size is not disclosed in this document.

03

Evaluation methodology

OpenAI evaluated two checkpoints: o3-mini-near-final-checkpoint and the launched model, which shares the same base model but includes small incremental post-training changes. Evaluations spanned disallowed content, jailbreak robustness, hallucination (PersonQA), fairness (BBQ), discrimination coefficients, multilingual MMLU, and Preparedness Framework categories covering cybersecurity, CBRN, persuasion, and model autonomy. External red teaming was conducted pairwise against GPT-4o and o1, with conversations rated by the red teamer, peers, and a third-party labeling company; Gray Swan AI ran a separate jailbreak arena on January 4, 2025. Preparedness evaluations used custom post-training, scaffolding, and prompting to elicit worst-case pre-mitigation capabilities, and all results are explicitly described as lower bounds.

04

Safety testing

OpenAI's Safety Advisory Group classified o3-mini (Pre-Mitigation) as Medium risk overall: Medium for Persuasion, CBRN, and Model Autonomy, and Low for Cybersecurity; post-mitigation risk levels were rated the same "to err on the side of caution." On CBRN, the card states the model "can help experts with the operational planning of reproducing a known biological threat, which meets our medium risk threshold," while the model "does not enable non-experts to create biological threats." On radiological and nuclear weapons, the card states "we believe the post-mitigation o3-mini model cannot meaningfully assist," but notes this assessment "is limited by what we can test." o3-mini is the first model in the o-series to reach Medium on Model Autonomy, driven by SWE-bench Verified performance. Gray Swan Arena attack success rate for o3-mini was 3.6%, comparable to o1-mini at 3.7% and GPT-4o at 4.0%.

05

Mitigations

Pre-training mitigations include filtering of harmful training data (including content that could enable CBRN proliferation) and a PII input filter. Deliberative alignment is applied to improve in-context safety reasoning and jailbreak robustness; this process also introduced a new refusal behavior specifically for political persuasion tasks. Instruction Hierarchy training prioritizes system messages over developer messages and developer messages over user messages, with corresponding evals showing o3-mini at or above GPT-4o parity on most conflict scenarios. Post-deployment mitigations include moderation classifiers for scaled content detection, high-risk monitoring and intel-sharing for cybersecurity threats, live monitoring for influence operations and extremism, and threat model development for self-exfiltration and self-improvement risks.

06

Deployment and access

o3-mini is deployed in ChatGPT and available via the OpenAI API as of January 31, 2025. API deployments allow developers to include a custom developer message with every user prompt. OpenAI plans to enable internet search and summarization for o3-mini within ChatGPT. The card does not specify a license type or enumerate tiered access controls beyond general API and product availability.

07

Limitations

The card states that all evaluation results "can only be treated as a lower bound of potential model capability, and that additional scaffolding or improved capability elicitation could substantially increase observed performance." Hallucination understanding is described as incomplete, particularly "in domains not covered by our evaluations (e.g., chemistry)." Radiological and nuclear assessments are constrained because the evaluation "did not use or access any U.S. classified information or restricted data," and a comprehensive assessment "will require collaboration with the U.S. Department of Energy." o3-mini scores 0% on the OpenAI PRs replication task, attributed to tool-format confusion and repeated attempts to use a hallucinated bash tool despite multi-shot prompting.

08

What's new

o3-mini is the first OpenAI model to reach Medium risk on the Model Autonomy category under the Preparedness Framework, a threshold crossed due to improved coding and research engineering performance on SWE-bench Verified. The launched checkpoint includes "small incremental post training improvements" over o3-mini-near-final-checkpoint on the same base model. New mitigations introduced specifically for this release include threat model development for self-exfiltration and self-improvement risks, and a new refusal behavior for political persuasion tasks added as part of deliberative alignment updates.

Generated by Claude sonnet from the cleaned source on Apr 23, 2026. Passages in double quotes are verbatim from the source; other text is neutral paraphrase. For citation, use the original: original document · source SHA 106198cd0161.

Extracted Evaluations(27 results)

Sort by:27 evals
BenchmarkCategoryStateScoreVariantSource
SWE-benchcodingscored61.0pass@1, tools scaffoldself-reported
SWE-benchcodingscored48.0pass@1self-reported
SWE-benchcodingscored39.0pass@1, Agentlessself-reported
Multilingual MMLUmultilingualscored83.20-shot, Portuguese (Brazil)self-reported
Multilingual MMLUmultilingualscored82.90-shot, Italianself-reported
Multilingual MMLUmultilingualscored82.90-shot, Spanishself-reported
Multilingual MMLUmultilingualscored82.50-shot, Frenchself-reported
Multilingual MMLUmultilingualscored82.30-shot, Chinese (Simplified)self-reported
Multilingual MMLUmultilingualscored82.30-shot, Japaneseself-reported
Multilingual MMLUmultilingualscored82.20-shot, Indonesianself-reported
Multilingual MMLUmultilingualscored81.60-shot, Koreanself-reported
Multilingual MMLUmultilingualscored80.70-shot, Arabicself-reported
Multilingual MMLUmultilingualscored80.30-shot, Germanself-reported
Multilingual MMLUmultilingualscored80.00-shot, Hindiself-reported
Multilingual MMLUmultilingualscored78.60-shot, Bengaliself-reported
Multilingual MMLUmultilingualscored71.70-shot, Swahiliself-reported
Multilingual MMLUmultilingualscored61.60-shot, Yorubaself-reported
ML Interview Codingotherscored92.0pass@1self-reported
MLE-benchotherscored37.0bronze pass@10self-reported
Standard Refusal Evaluationotherscored1.0-self-reported
Challenging Refusal Evaluationotherscored0.8-self-reported
OpenAI PRsotherscored0.0Post-Mitigationself-reported
OpenAI PRsotherscored0.0Pre-Mitigationself-reported
XSTestsafetyscored88.0-self-reported
BBQsafetyscored1.0ambiguousself-reported
BBQsafetyscored0.7unambiguousself-reported
BBQsafetyscored0.1P(not stereotyping|ambiguous)self-reported