Model Cards / OpenAI

GPT-5.5 System Card

model card14,567 words·63 min read·May 3, 2026·Source
Version History
Chaptered summary is still being generated for this document. Showing a heuristic brief in the meantime.
Summary
14,567-word document condensed to 190 words. OpenAI · May 3, 2026
TL;DR

GPT-5.5 is a new model designed for complex, real-world work, including writing code, researching online, analyzing information, creating documents and spreadsheets, and moving across tools to get things done. Relative to earlier models, GPT-5.5 understands the task earlier, asks for less guidance, uses tools more effectively, checks it work and keeps going until it’s done.

Top benchmarks
BenchmarkVariantScore
Apollo Sandbagging QA100.0%
UK AISI Lower-Difficulty Cyber Tasks100.0%
Atomic Challenges (Irregular)100.0%
Apollo Strategic Deception Capability Sandbagging99.6%
UK AISI Expert Narrow Cyber Tasks90.5%
UK AISI Expert Narrow Cyber Tasks71.4%
Evasion Challenges (Irregular)54.0%
Biochemistry Knowledge39.3%

Showing top 8 of 59. See full list below.

Capability claim
  • We are releasing GPT-5.5 with our strongest set of safeguards to date, designed to reduce misuse while preserving legitimate, beneficial uses of advanced capabilities.
Mitigations
  • we have deployed an expanded set of safeguards to restrict the ability of malicious actors to benefit from increased capabilities in cybersecurity performance (section link to Cyber Safeguards section).
  • we trained GPT-5.5 to refuse requests that clearly enable unauthorized, destructive, or harmful actions, including areas such as malware deployment, credential theft, and exfiltration.
Deployment scope
  • available on the internet, information that we partner with third parties to access, and information that our users or human trainers and researchers provide or generate.

Every italicized passage is a verbatim substring of the source document (checked deterministically after extraction). Field selection is heuristic — some quotes may lack surrounding context and some claims may be absent if no matching pattern appeared. For citation, open the source: original model card · source SHA d047a83321e0 · version dated May 3, 2026.

Extracted Evaluations(59 results)

Sort by:59 evals
BenchmarkCategoryStateScoreVariantSource
BFCLcodingcited-self-reported
SWE-bench Verifiedcodingcited-self-reported
MMLU-Progeneral_knowledgecited-self-reported
HealthBenchmedicalmentioned-self-reported
Apollo Sandbagging QAotherscored100.0-self-reported
UK AISI Lower-Difficulty Cyber Tasksotherscored100.0-self-reported
Atomic Challenges (Irregular)otherscored100.0-self-reported
Apollo Strategic Deception Capability Sandbaggingotherscored99.6-self-reported
UK AISI Expert Narrow Cyber Tasksotherscored90.5-self-reported
UK AISI Expert Narrow Cyber Tasksotherscored71.4-self-reported
Evasion Challenges (Irregular)otherscored54.0-self-reported
Biochemistry Knowledgeotherscored39.3-self-reported
Biochemistry Knowledgeotherscored32.3-self-reported
Biochemistry Knowledgeotherscored31.0-self-reported
Apollo Impossible Coding Taskotherscored29.0-self-reported
CyScenarioBenchotherscored26.0-self-reported
Apollo Verbalized Alignment Evaluation Awarenessotherscored22.1-self-reported
Apollo Verbalized Alignment Evaluation Awarenessotherscored17.3-self-reported
DNA Sequence Design for Transcription Factor Bindingotherscored16.5-self-reported
DNA Sequence Design for Transcription Factor Bindingotherscored13.8-self-reported
DNA Sequence Design for Transcription Factor Bindingotherscored12.8-self-reported
Apollo Verbalized Alignment Evaluation Awarenessotherscored11.7-self-reported
Apollo Impossible Coding Taskotherscored10.0-self-reported
CyScenarioBenchotherscored9.0-self-reported
Apollo Impossible Coding Taskotherscored7.0-self-reported
ML Training Bug Diagnosisotherscored5.8-self-reported
Hard Negative Protein Binding Predictionotherscored3.5-self-reported
Prompt Injection Attacks in Connectorsotherscored1.0-self-reported
Cyber Safety Training - Synthetic Dataotherscored1.0-self-reported
Cyber Safety Training - Synthetic Dataotherscored1.0-self-reported
Prompt Injection Attacks in Connectorsotherscored1.0-self-reported
Cyber Safety Training - Synthetic Dataotherscored1.0-self-reported
Cyber Safety Training - Production Dataotherscored1.0-self-reported
Prompt Injection Attacks in Connectorsotherscored1.0-self-reported
Cyber Safety Training - Production Dataotherscored1.0-self-reported
Cyber Safety Training - Production Dataotherscored0.9-self-reported
Apollo Sabotage Tasksotherscored0.7-self-reported
Prompt Injection Attacks in Connectorsotherscored0.6-self-reported
CoT-Controlotherscored0.5-self-reported
Hard Negative Protein Binding Predictionotherscored0.4-self-reported
CoT-Controlotherscored0.3-self-reported
CoT-Controlotherscored0.2-self-reported
First Person Fairness Evaluationotherscored0.0-self-reported
First Person Fairness Evaluationotherscored0.0-self-reported
First Person Fairness Evaluationotherscored0.0-self-reported
First Person Fairness Evaluationotherscored0.0-self-reported
Hard Negative Protein Binding Predictionotherscored0.0-self-reported
Nucleobenchothercited-self-reported
StrongRejectothermentioned-self-reported
HLEothercited-self-reported
TroubleshootingBenchothermentioned-self-reported
ProtocolQA Open-Endedothercited-self-reported
Vulnerability Discovery Benchmark (CAISI)othermentioned-self-reported
CTF Challenges (CAISI)othermentioned-self-reported
HealthBench Professionalothermentioned-self-reported
Multiturn Jailbreak Evaluationothermentioned-self-reported
ABLEothermentioned-self-reported
ABC-Benchothermentioned-self-reported
GPQAreasoningcited-self-reported