Llama 4 Model Card
What this is
Llama 4 is a collection of natively multimodal AI models developed by Meta, released April 5, 2025, succeeding the Llama 3.x series. The release comprises two models: Llama 4 Scout (17B activated / 109B total parameters, 16 experts) and Llama 4 Maverick (17B activated / 400B total parameters, 128 experts). Both use a mixture-of-experts (MoE) auto-regressive architecture with early fusion for native multimodality. The models are designed for commercial and research use across text, image, and code tasks in multiple languages.
Capabilities
Both models accept multilingual text and image input and produce multilingual text and code output across 12 supported languages. Scout offers a 10M-token context window; Maverick offers 1M. On instruction-tuned benchmarks, Maverick scores 69.8 on GPQA Diamond, 80.5 on MMLU Pro, 73.4 on MMMU, and 43.4 pass@1 on LiveCodeBench; Scout scores 57.2 on GPQA Diamond, 74.3 on MMLU Pro, and 69.4 on MMMU. Both models score 94.4 ANLS on DocVQA (test). Knowledge cutoff is August 2024.
Evaluation methodology
All reported evaluations were conducted on bf16 models, not quantized checkpoints. Meta built dedicated adversarial evaluation datasets for common use cases (chatbot, visual QA) and evaluated systems composed of Llama models paired with Llama Guard 3 to filter inputs and outputs. Capability-specific benchmarks cover long context, multilingual, coding, and memorization. Recurring red-teaming exercises involve experts in cybersecurity, adversarial machine learning, integrity, and multilingual content from specific geographic markets.
Safety testing
Meta evaluated three critical risk areas. For CBRN, expert-designed evaluations assessed whether Llama 4 could "meaningfully increase the capabilities of malicious actors to plan or carry out attacks" using chemical, biological, radiological, nuclear, or explosive materials. For child safety, a dedicated expert team assessed model outputs, with benchmarks expanded to cover multi-image and multilingual capabilities. For cyber, threat modeling and capability challenges assessed whether Llama 4 could automate attacks or exploit vulnerabilities; Meta reports it finds "that Llama 4 models do not introduce risk plausibly enabling catastrophic cyber outcomes."
Mitigations
Model-level safeguards include safety fine-tuning using human-generated and synthetic data, with LLM-based classifiers for data quality control. Meta reduced refusals to benign prompts and retrained tone to remove preachy or moralizing language. At the system level, Meta provides Llama Guard, Prompt Guard, and Code Shield as open-source tools that developers are directed to deploy alongside the model. The reference implementation includes these protections by default. No ASL or FSF tier is referenced in this document.
Deployment and access
Llama 4 is released under the Llama 4 Community License Agreement, a custom commercial license. Scout is released as BF16 weights and supports on-the-fly int4 quantization to fit a single H100 GPU; Maverick is released in both BF16 and FP8, with FP8 fitting a single H100 DGX host. Use cases prohibited include violations of applicable law, the Acceptable Use Policy, and deployment in languages or capabilities beyond those explicitly supported. Developers extending use to additional languages or beyond 5 input images bear responsibility for safety testing.
Limitations
Meta states that testing "has not covered, nor could it cover, all scenarios" and that Llama 4's "potential outputs cannot be predicted in advance." The model may "produce inaccurate or other objectionable responses to user prompts." Image understanding has been tested for up to 5 input images; use beyond that is at the developer's risk. Pre-training covered approximately 200 languages, but only 12 are officially supported; Meta places responsibility for safe use in additional languages on developers.
What's new
Llama 4 introduces the MoE architecture and native multimodality to the Llama family, both absent from Llama 3.x. Scout's 10M-token context window and Maverick's 1M-token window represent a substantial increase over the 128K context available in Llama 3.1 405B. No version changelog or incremental delta entries are provided in this document beyond the transition from the Llama 3 generation.