Gemini 3.1 Pro Model Card
What this is
Gemini 3.1 Pro is Google DeepMind's next iteration in the Gemini 3 series, published February 2026, and built on Gemini 3 Pro. Google describes it as its most advanced model for complex tasks as of this card's publication date. It is a natively multimodal reasoning model targeting agentic performance, advanced coding, long-context understanding, and algorithmic development.
Capabilities
The model accepts text, images, audio, video, and entire code repositories as input, with a 1M token context window and 64K token output. The card states the model "significantly outperforms Gemini 3 Pro across a range of benchmarks requiring enhanced reasoning and multimodal capabilities" as of February 2026, but the benchmark score table is not reproduced in this document; full results are referenced at deepmind.google/models/evals-methodology/gemini-3-1-pro. Intended use cases include agentic tool use, advanced coding, multilingual tasks, and long-context multimodal understanding.
Evaluation methodology
Gemini 3.1 Pro was evaluated across reasoning, multimodal capabilities, agentic tool use, multilingual performance, and long-context benchmarks. Safety evaluations include automated content safety scoring and manual red teaming by specialist teams sitting outside the model development team. FSF evaluations focused on Deep Think mode across five risk domains; additional testing in the cyber domain was conducted both with and without Deep Think mode because a prior model had reached that domain's alert threshold.
Safety testing
Automated safety evaluations show Gemini 3.1 Pro outperforms Gemini 3.0 Pro on text-to-text safety (+0.10%), multilingual safety (+0.11%), and tone (+0.02%), with regressions on image-to-text safety (-0.33%) and unjustified refusals (-0.08%); manual review confirmed losses were "overwhelmingly either a) false positives or b) not egregious." Child safety evaluations satisfied required launch thresholds. Under the Frontier Safety Framework, the model remains below alert thresholds for CBRN, harmful manipulation, ML R&D, and misalignment CCLs. In the cyber domain, the model has again reached the alert threshold but "still does not reach the levels of uplift required for the CCL." On harmful manipulation the maximum odds ratio for belief change was 3.6x versus a non-AI baseline (unchanged from Gemini 3 Pro); on misalignment stealth evaluations the model achieves "almost 100%" success on three challenges no prior model had consistently solved, yet overall performance remains below the alert threshold.
Mitigations
The card defers full risks and mitigations details to the Gemini 3 Pro model card and the Gemini 3 Pro Frontier Safety Framework Report. Google states it "continues to deploy mitigations" in the CBRN and cyber domains. The FSF strategy uses a "safety buffer" to prevent models from reaching critical capability levels before the next regular testing interval, with continuous testing at a fixed cadence and when a significant capability jump is detected.
Deployment and access
Gemini 3.1 Pro is distributed via the Gemini App, Google Cloud/Vertex AI, Google AI Studio, Gemini API, Google Antigravity, Gemini Enterprise, and NotebookLM. Downstream providers access the model through an API subject to relevant terms of service. For AI Studio and Gemini API the Gemini API Additional Terms of Service apply; for Vertex AI, the Google Cloud Platform Terms of Service apply. No specific hardware or software is required to use the model.
Limitations
The card does not enumerate specific limitations for Gemini 3.1 Pro, directing readers to the Gemini 3 Pro model card for known limitations and acceptable usage. In the CBRN domain, the model "fails to offer novel or sufficiently complete and detailed instructions for critical stages" needed to significantly uplift low-to-medium-resourced threat actors, which the card presents as a capability ceiling rather than a disclosed flaw.
What's new
Gemini 3.1 Pro is a performance update over Gemini 3.0 Pro with gains across safety, tone, and benchmark tasks requiring enhanced reasoning and multimodal capabilities. On ML R&D evaluations it achieves a human-normalized average of 1.27 on RE-Bench versus Gemini 3 Pro's 1.04, and on one challenge reduced a fine-tuning script runtime from 300 seconds to 47 seconds against a human reference of 94 seconds. On misalignment evaluations it shows stronger situational awareness than Gemini 3 Pro, achieving near-perfect scores on three challenges no prior model had consistently solved.