Model Cards / Google DeepMind

Gemini 2.5 Flash Model Card

model card2,895 words·13 min read·Mar 31, 2026·Source

Summary

Gemini 2.5 Flash Model Card

A 676-word brief of a 2,895-word document. Published by Google DeepMind. Version dated Mar 31, 2026.

What this is

Gemini 2.5 Flash is a natively multimodal reasoning model from Google DeepMind, positioned as the next iteration in the Gemini 2.0 series. The model card was last updated December 2025 and covers three output variants: Gemini 2.5 Flash (text), Gemini 2.5 Flash Image, and Gemini 2.5 Flash Audio. Google describes it as their "first fully hybrid reasoning model," allowing developers to toggle thinking on or off and set thinking budgets to balance quality, cost, and latency.

Capabilities

The model accepts text, images, audio, and video inputs with a 1M token context window, and produces text (64K token output), images (32K), or audio (32K) depending on variant. Key benchmark scores for the Preview (09-2025) thinking variant include: GPQA Diamond 80.8%, AIME 2025 75.6%, LiveCodeBench v5 71.7%, MMMU 80.3%, Humanity's Last Exam 13.2%, and FACTS Grounding 87.5%. The architecture is a sparse mixture-of-experts (MoE) transformer with native multimodal support; the MoE design decouples total model capacity from per-token compute cost. Gemini 2.5 Flash Image ranked first on LMArena for both text-to-image and image editing as of August 25, 2025.

Evaluation methodology

Gemini results use pass@1 with no majority voting unless stated, run via the AI Studio API across model IDs including gemini-2.5-flash-preview-09-2025, gemini-2.5-flash-preview-05-20, gemini-2.5-flash-preview-04-17, and gemini-2.0-flash with default sampling; smaller benchmarks were averaged over multiple trials to reduce variance. Non-Gemini comparison numbers are sourced from providers' self-reported figures unless otherwise noted; SWE-bench Verified numbers use each provider's own scaffolding. Contamination controls are not explicitly described. Gemini 2.5 Flash Image evaluations used human preference ratings via GenAI-Bench and LMArena alongside automatic prompt-alignment and image-quality metrics.

Safety testing

Safety evaluation types included continuous automated and human training-phase evaluations, human red teaming by specialist teams, automated red teaming at scale, independent assurance evaluations, and ethics and safety reviews prior to release. Testing also followed Google DeepMind's Frontier Safety Framework (FSF); however, rather than running a full frontier safety assessment on Flash directly, the card states that "as Gemini 2.5 Flash is less capable than Gemini 2.5 Pro Preview" the FSF results reported in the Pro Preview card provide sufficient confidence that Flash does not reach critical capability levels. Automated safety evaluations relative to Gemini 2.0 Flash show violation rate reductions: text-to-text safety improved by +9.1%, multilingual safety by +12.0%, and image-to-text safety by +6.0% (all labeled non-egregious). Manual review of flagged losses confirmed they were "overwhelmingly either false positives or not egregious," concentrated around creative-use requests for sexually suggestive or hateful content.

Mitigations

Safety and responsibility measures span the full training and deployment lifecycle. Mitigations include dataset filtering, conditional pre-training, supervised fine-tuning, reinforcement learning from human and critic feedback, safety policies and desiderata, and product-level safety filtering. The card notes that improved instruction-following training makes the model "more willing to engage with prompts that previous models may have incorrectly refused," with ongoing refinement of automated evaluations to reduce false positives and negatives.

Deployment and access

Deployment status is listed as "general availability." The model is accessible via the AI Studio API. No information on licensing terms, pricing tiers, geographic restrictions, or enterprise access controls is disclosed in this document.

Limitations

The card flags general foundation-model limitations including hallucinations, weak causal understanding, complex logical deduction, and counterfactual reasoning. Adherence to thinking budgets "may not be consistent." Gemini 2.5 Flash Image may struggle with long-form text rendering and factual representation of fine image details; Gemini 2.5 Flash Audio may exhibit pronunciation errors and voice drift on long multi-turn conversations. The knowledge cutoff for all three variants is January 2025. The main identified safety limitation is tone: the model "will sometimes respond in a way which can come across as 'preachy.'"

What's new

Gemini 2.5 Flash is presented as a successor to Gemini 2.0 Flash, adding hybrid reasoning (toggleable thinking and configurable thinking budgets) as a core differentiator. Native image generation and audio output are new modalities described in this card version. The card notes that evaluation processes and benchmark sets have been updated, making results "not directly comparable with performance results found in previous Gemini model cards"; specifically, the MRCR v2 evaluation shifted to a harder 8-needle version and SimpleQA was replaced with SimpleQA Verified.

Extracted Evaluations(54 results)

Sort by:0/54 rows fully reproducible (0%)

Benchmark	Category	State	Score	Setup	Source
/ global_lite	knowledge	scored	87.9% accuracy	Averageinstruction-tunedmissing: shot countmissing: method	self-reported
LMArena/ image_editing_overall	other	scored	1362.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_overall	other	scored	1191.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_character	other	scored	1170.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_overall	other	scored	1170.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_stylization	other	scored	1165.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ text_to_image_overall	other	scored	1147.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_overall	other	scored	1145.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ text_to_image_overall	other	scored	1135.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ text_to_image_overall	other	scored	1129.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_product_recontextualization	other	scored	1128.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_creative	other	scored	1112.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
GenAI-Bench/ visual_quality	other	scored	1103.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
GenAI-Bench/ visual_quality	other	scored	1094.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_overall	other	scored	1093.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_stylization	other	scored	1091.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ text_to_image_overall	other	scored	1075.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_infographics	other	scored	1067.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_object_environment	other	scored	1064.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_stylization	other	scored	1062.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_character	other	scored	1059.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_creative	other	scored	1057.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
GenAI-Bench/ text_to_image_alignment	other	scored	1053.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
GenAI-Bench/ text_to_image_alignment	other	scored	1046.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
GenAI-Bench/ text_to_image_alignment	other	scored	1042.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_product_recontextualization	other	scored	1032.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_infographics	other	scored	1029.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_object_environment	other	scored	1023.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
GenAI-Bench/ visual_quality	other	scored	1013.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_infographics	other	scored	1012.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_object_environment	other	scored	1010.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_character	other	scored	1010.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_product_recontextualization	other	scored	1009.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_object_environment	other	scored	1002.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ text_to_image_overall	other	scored	988.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_creative	other	scored	983.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_creative	other	scored	968.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_infographics	other	scored	967.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_stylization	other	scored	949.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_product_recontextualization	other	scored	943.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
GenAI-Bench/ text_to_image_alignment	other	scored	937.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
GenAI-Bench/ visual_quality	other	scored	926.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_infographics	other	scored	925.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
GenAI-Bench/ text_to_image_alignment	other	scored	922.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_character	other	scored	911.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_object_environment	other	scored	901.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_product_recontextualization	other	scored	888.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_creative	other	scored	879.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
GenAI-Bench/ visual_quality	other	scored	864.0 elo	missing: shot countmissing: methodmissing: languagemissing: training state	self-reported
LMArena/ image_editing_character	other	scored	850.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
LMArena/ image_editing_stylization	other	scored	733.0 elo	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
MRCR/ 128k	other	scored	52.4 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
MRCR/ 1m	other	scored	16.3 accuracy	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
Frontier Safety Framework	other	mentioned	—	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported