Gemini 2.5 Flash Model Card
What this is
Gemini 2.5 Flash is a natively multimodal reasoning model from Google DeepMind, positioned as the next iteration in the Gemini 2.0 series. The model card was last updated December 2025 and covers three output variants: Gemini 2.5 Flash (text), Gemini 2.5 Flash Image, and Gemini 2.5 Flash Audio. Google describes it as their "first fully hybrid reasoning model," allowing developers to toggle thinking on or off and set thinking budgets to balance quality, cost, and latency.
Capabilities
The model accepts text, images, audio, and video inputs with a 1M token context window, and produces text (64K token output), images (32K), or audio (32K) depending on variant. Key benchmark scores for the Preview (09-2025) thinking variant include: GPQA Diamond 80.8%, AIME 2025 75.6%, LiveCodeBench v5 71.7%, MMMU 80.3%, Humanity's Last Exam 13.2%, and FACTS Grounding 87.5%. The architecture is a sparse mixture-of-experts (MoE) transformer with native multimodal support; the MoE design decouples total model capacity from per-token compute cost. Gemini 2.5 Flash Image ranked first on LMArena for both text-to-image and image editing as of August 25, 2025.
Evaluation methodology
Gemini results use pass@1 with no majority voting unless stated, run via the AI Studio API across model IDs including gemini-2.5-flash-preview-09-2025, gemini-2.5-flash-preview-05-20, gemini-2.5-flash-preview-04-17, and gemini-2.0-flash with default sampling; smaller benchmarks were averaged over multiple trials to reduce variance. Non-Gemini comparison numbers are sourced from providers' self-reported figures unless otherwise noted; SWE-bench Verified numbers use each provider's own scaffolding. Contamination controls are not explicitly described. Gemini 2.5 Flash Image evaluations used human preference ratings via GenAI-Bench and LMArena alongside automatic prompt-alignment and image-quality metrics.
Safety testing
Safety evaluation types included continuous automated and human training-phase evaluations, human red teaming by specialist teams, automated red teaming at scale, independent assurance evaluations, and ethics and safety reviews prior to release. Testing also followed Google DeepMind's Frontier Safety Framework (FSF); however, rather than running a full frontier safety assessment on Flash directly, the card states that "as Gemini 2.5 Flash is less capable than Gemini 2.5 Pro Preview" the FSF results reported in the Pro Preview card provide sufficient confidence that Flash does not reach critical capability levels. Automated safety evaluations relative to Gemini 2.0 Flash show violation rate reductions: text-to-text safety improved by +9.1%, multilingual safety by +12.0%, and image-to-text safety by +6.0% (all labeled non-egregious). Manual review of flagged losses confirmed they were "overwhelmingly either false positives or not egregious," concentrated around creative-use requests for sexually suggestive or hateful content.
Mitigations
Safety and responsibility measures span the full training and deployment lifecycle. Mitigations include dataset filtering, conditional pre-training, supervised fine-tuning, reinforcement learning from human and critic feedback, safety policies and desiderata, and product-level safety filtering. The card notes that improved instruction-following training makes the model "more willing to engage with prompts that previous models may have incorrectly refused," with ongoing refinement of automated evaluations to reduce false positives and negatives.
Deployment and access
Deployment status is listed as "general availability." The model is accessible via the AI Studio API. No information on licensing terms, pricing tiers, geographic restrictions, or enterprise access controls is disclosed in this document.
Limitations
The card flags general foundation-model limitations including hallucinations, weak causal understanding, complex logical deduction, and counterfactual reasoning. Adherence to thinking budgets "may not be consistent." Gemini 2.5 Flash Image may struggle with long-form text rendering and factual representation of fine image details; Gemini 2.5 Flash Audio may exhibit pronunciation errors and voice drift on long multi-turn conversations. The knowledge cutoff for all three variants is January 2025. The main identified safety limitation is tone: the model "will sometimes respond in a way which can come across as 'preachy.'"
What's new
Gemini 2.5 Flash is presented as a successor to Gemini 2.0 Flash, adding hybrid reasoning (toggleable thinking and configurable thinking budgets) as a core differentiator. Native image generation and audio output are new modalities described in this card version. The card notes that evaluation processes and benchmark sets have been updated, making results "not directly comparable with performance results found in previous Gemini model cards"; specifically, the MRCR v2 evaluation shifted to a harder 8-needle version and SimpleQA was replaced with SimpleQA Verified.