Model Cards / Google DeepMind

Gemini 3 Pro Model Card

model card2,179 words·9 min read·Mar 31, 2026·Source
Summary

Gemini 3 Pro Model Card

A 622-word brief of a 2,179-word document. Published by Google DeepMind. Version dated Mar 31, 2026.
01

What this is

Gemini 3 Pro is Google DeepMind's most advanced model for complex tasks, released November 2025, superseding Gemini 2.5 Pro. It is a sparse mixture-of-experts (MoE) transformer with native multimodal support for text, audio, images, video, and entire code repositories. It is not a modification or fine-tune of a prior model. An optional Deep Think mode is available at inference time to enhance complex problem-solving performance.

02

Capabilities

The model accepts text, images, audio, and video inputs with a 1M token context window and produces up to 64K tokens of text output. It is designed for agentic performance, advanced coding, long-context understanding, algorithmic development, and multilingual tasks. Benchmark tables are referenced in the card but numeric scores are not reproduced in the source text; the lab states Gemini 3 Pro "significantly outperforms Gemini 2.5 Pro across a range of benchmarks requiring enhanced reasoning and multimodal capabilities."

03

Evaluation methodology

The model was evaluated across reasoning, multimodal, agentic tool use, multilingual, and long-context benchmarks, with full methodology published separately at deepmind.com/models/evals-methodology/gemini-3-pro. Safety evaluations combined continuous automated pipelines during and after training with human red teaming conducted by specialist teams external to the model development team. The lab notes that updated evaluation methods mean results are not directly comparable to scores reported in previous Gemini model cards.

04

Safety testing

Testing included human red teaming by external specialist teams, automated red teaming at scale, continuous training-phase evaluations, and pre-release Ethics and Safety Reviews aligned with Google's AI Principles and Frontier Safety Framework (FSF, September 2025). FSF evaluations found no critical capability levels (CCLs) reached across CBRN, cybersecurity, harmful manipulation, ML R&D, and misalignment domains. On cybersecurity, the model solved 11/12 v1 hard challenges but 0/13 v2 challenges end-to-end; the alert threshold was met but the CCL was not reached. On CBRN, the model "provides accurate and occasionally actionable information but generally fails to offer novel or sufficiently complete and detailed instructions to significantly enhance the capabilities of low to medium resourced threat actors." Deep Think mode evaluations yielded results consistent with the base model assessment across both safety and FSF domains.

05

Mitigations

Mitigations span the full lifecycle: dataset filtering, conditional pre-training, supervised fine-tuning, reinforcement learning from human and critic feedback, safety policies and desiderata, and product-level safety filtering. Automated safety evaluations show a -10.4% regression in text-to-text safety versus Gemini 2.5 Pro, though manual review confirmed losses were "overwhelmingly either a) false positives or b) not egregious." Child safety evaluations satisfied required launch thresholds developed by expert teams. The main acknowledged residual risks are jailbreak vulnerability ("improved compared to Gemini 2.5 Pro but still an open research problem") and possible degradation in multi-turn conversations.

06

Deployment and access

Gemini 3 Pro is distributed via the Gemini App, Google Cloud/Vertex AI, Google AI Studio, the Gemini API, Google AI Mode, and Google Antigravity. No specific hardware or software is required to use the model. Use is governed by the Gemini API Additional Terms of Service, the Google Cloud Platform Terms of Service, and Google's Generative AI Prohibited Use Policy, which bars integration into systems engaging in dangerous, illicit, harmful, or misinformation activities.

07

Limitations

The lab flags hallucinations and occasional slowness or timeout issues as known limitations of the model. The knowledge cutoff date is January 2025. Jailbreak vulnerability and possible degradation in multi-turn conversations are identified as the main open risks.

08

What's new

Gemini 3 Pro (released November 2025, model card updated December 2025) supersedes Gemini 2.5 Pro and is not a modification or fine-tune of a prior model. The release introduces Deep Think mode, an optional inference-time setting for enhanced complex problem-solving. Compared to Gemini 2.5 Pro, the scope of red teaming was expanded to cover more potential issues outside strict policies, and this card provides more detail on training dataset composition and distribution than prior cards in the Gemini series.

Generated by Claude sonnet from the cleaned source on Apr 23, 2026. Passages in double quotes are verbatim from the source; other text is neutral paraphrase. For citation, use the original: original document · source SHA 6a12f7b71c0a.

Extracted Evaluations(9 results)

Sort by:9 evals
BenchmarkCategoryStateScoreVariantSource
Cybersecurity Key Skills Benchmarkotherscored91.7v1 hard challengesself-reported
Misalignment Situational Awarenessotherscored27.3exploratoryself-reported
Misalignment Stealth Challengesotherscored25.0exploratoryself-reported
Toneotherscored7.9-self-reported
Unjustified-refusalsotherscored3.7non-egregiousself-reported
Image to Text Safetyotherscored3.1non-egregiousself-reported
Multilingual Safetyotherscored0.2non-egregiousself-reported
Cybersecurity Key Skills Benchmarkotherscored0.0v2 challenges end-to-endself-reported
Text to Text Safetyotherscored-10.4-self-reported