Model Cards / Google DeepMind

Gemini 2.0 Flash Model Card

model card1,529 words·7 min read·Mar 31, 2026·Source

Summary

Gemini 2.0 Flash Model Card

A 567-word brief of a 1,529-word document. Published by Google DeepMind. Version dated Mar 31, 2026.

What this is

Gemini 2.0 Flash is a multimodal language model released by Google DeepMind, with its model card published April 15, 2025. It is a member of the Gemini 2.0 series, designed to power agentic systems, and improves upon Gemini 1.5 Flash with enhanced quality at comparable speeds. It is positioned as an upgrade path for Gemini 1.5 Flash users seeking better quality and for Gemini 1.5 Pro users who require lower latency.

Capabilities

Gemini 2.0 Flash accepts text, images, audio, and video inputs within a 1,048,576-token context window and produces text outputs up to 8,192 tokens; image outputs are experimental as of the card's publication date. It scores 77.6% on MMLU-Pro, 90.9% on MATH, 60.1% on GPQA Diamond, 71.7% on MMMU, and 29.9% on SimpleQA, outperforming Gemini 1.5 Pro on most of these benchmarks. The model also supports a Multimodal Live API enabling low-latency bidirectional voice and video interaction, and shows improvements in coding, complex instruction following, and function calling.

Evaluation methodology

Gemini 2.0 Flash was evaluated against a suite of public performance benchmarks, with results compared directly to Gemini 1.5 Flash, Gemini 1.5 Pro, and Gemini 2.0 Flash-Lite. Internal safety evaluations during training report scores as absolute percentage change relative to Gemini 1.5 Pro 002, where a decrease indicates reduced violation rates and a positive increase in tone indicates improvement. Assurance evaluations use held-out prompt sets, kept separate from the model team, to prevent overfitting and preserve their value for release decision-making.

Safety testing

Safety evaluation included human red teaming by specialist teams, automated red teaming at scale, assurance evaluations conducted by teams outside the model development group, and Frontier Safety Framework (FSF) evaluations per Google DeepMind's FSF. Google DeepMind's Responsibility and Safety Council (RSC) reviewed ethics and safety assessments and made release decisions. Automated safety results versus Gemini 1.5 Pro 002 show text-to-text safety at -1.0% (lower violations), multilingual safety at -1.0%, and image-to-text at +1.50%, indicating a small regression in that modality, though overall violation rates remained low. The card does not report specific CBRN, cyber, or autonomy-risk evaluation results.

Mitigations

Safety and responsibility mitigations were applied across the full training and deployment lifecycle. These include dataset filtering, conditional pre-training, supervised fine-tuning, reinforcement learning from human and critic feedback, safety policies and desiderata, and product-level safety filtering. The Gemini 2.0 family displays lower violation rates across most modalities than Gemini 1.5 Pro, which was itself described as a significant improvement over Gemini 1.0.

Deployment and access

Gemini 2.0 Flash is generally available (GA) as of the card's publication date. It is accessible via Google's Gemini API and is intended for real-time streaming and daily task use cases. The card does not specify a license type or explicit access restrictions beyond Google's standard content policies.

Limitations

The card flags hallucinations, limited causal understanding, complex logical deduction, and counterfactual reasoning as known general limitations of the model. The knowledge cutoff date is June 2024. The main identified safety limitations are over-refusals — where the model refuses answering benign prompts — and a refusal tone that can still come across as "preachy," though tone has improved relative to Gemini 1.5.

What's new

Gemini 2.0 Flash introduces refined architectural design and novel optimization methods on top of the sparse Mixture-of-Experts Transformer used in Gemini 1.5, yielding improvements in training stability and computational efficiency. The Multimodal Live API, enabling low-latency bidirectional voice and video interactions, is new to this generation. Experimental image output capability is introduced, not present in Gemini 1.5 Flash.

Extracted Evaluations(58 results)

Sort by:0/58 rows fully reproducible (0%)

Benchmark	Category	State	Score	Setup	Source
/ v5	coding	scored	34.5	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ v5	coding	scored	34.2	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ v5	coding	scored	30.7	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ v5	coding	scored	28.9	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ global_lite	knowledge	scored	83.4% accuracy	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
/ global_lite	knowledge	scored	80.8% accuracy	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
/ global_lite	knowledge	scored	78.2% accuracy	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
/ pro	knowledge	scored	77.6% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ pro	knowledge	scored	75.8% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ global_lite	knowledge	scored	73.7% accuracy	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
/ pro	knowledge	scored	71.6% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ pro	knowledge	scored	67.3% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	math	scored	90.9% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	math	scored	86.8% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	math	scored	86.5% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	math	scored	77.9% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	84.6 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	83.6 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	82.9 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
MRCR/ 1m	other	scored	82.6 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	80.0 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
MRCR/ 1m	other	scored	71.9 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
EgoSchema/ test	other	scored	71.2 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
EgoSchema/ test	other	scored	71.1 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
MRCR/ 1m	other	scored	70.5 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
EgoSchema/ test	other	scored	67.2 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
EgoSchema/ test	other	scored	66.8 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
HiddenMath	other	scored	63.5 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
Bird-SQL/ dev	other	scored	58.7 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
MRCR/ 1m	other	scored	58.0 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
Bird-SQL/ dev	other	scored	57.4 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
HiddenMath	other	scored	55.3 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
Bird-SQL/ dev	other	scored	54.4 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
HiddenMath	other	scored	52.0 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
HiddenMath	other	scored	47.2 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
Bird-SQL/ dev	other	scored	45.6 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
CoVoST2/ 21_lang	other	scored	40.1 bleu	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
CoVoST2/ 21_lang	other	scored	39.0 bleu	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
CoVoST2/ 21_lang	other	scored	38.4 bleu	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
CoVoST2/ 21_lang	other	scored	37.4 bleu	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
	other	scored	29.9 accuracy	no-toolsinstruction-tunedmissing: shot countmissing: language	self-reported
	other	scored	24.9 accuracy	no-toolsinstruction-tunedmissing: shot countmissing: language	self-reported
	other	scored	21.7 accuracy	no-toolsinstruction-tunedmissing: shot countmissing: language	self-reported
	other	scored	8.6 accuracy	no-toolsinstruction-tunedmissing: shot countmissing: language	self-reported
	other	scored	1.5 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	1.5 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	0.0 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	-1.0 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	-1.0 accuracy	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
Frontier Safety Framework	other	mentioned	—	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ diamond	reasoning	scored	60.1% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ diamond	reasoning	scored	59.1% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ diamond	reasoning	scored	51.5% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ diamond	reasoning	scored	51.0% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	vision	scored	71.7% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	vision	scored	68.0% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	vision	scored	65.9% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	vision	scored	62.3% accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported