Model Cards / Meta AI

Llama 2 Model Card

model card956 words·4 min read·Mar 31, 2026·Source
Summary

Llama 2 Model Card

A 542-word brief of a 956-word document. Published by Meta AI. Version dated Mar 31, 2026.
01

What this is

Llama 2 is a family of pretrained and fine-tuned large language models developed and released by Meta AI, trained between January 2023 and July 2023. The family spans 7B, 13B, and 70B parameter sizes. A fine-tuned variant, Llama-2-Chat, is optimized for dialogue and uses supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). Llama 2 succeeds Llama 1 and is positioned as both a commercial and research release.

02

Capabilities

All models accept and produce text only, with a 4k-token context window across all sizes. The 70B model scores 68.9 on MMLU, 37.5 on code (HumanEval/MBPP average), 71.9 on commonsense reasoning, and 35.2 on math benchmarks. Llama-2-Chat 70B scores 64.14 on TruthfulQA and 0.01% on ToxiGen toxic generation rate. Meta reports Llama-2-Chat models are "on par with some popular closed-source models like ChatGPT and PaLM" on human evaluations for helpfulness and safety.

03

Evaluation methodology

Benchmarks were run using Meta's internal evaluations library across grouped categories: code, commonsense reasoning, world knowledge, reading comprehension, math, MMLU, BBH, and AGI Eval. Safety was assessed via TruthfulQA (percentage of truthful and informative generations) and ToxiGen (percentage of toxic generations). Human evaluations were also conducted for helpfulness and safety, comparing Llama-2-Chat against open-source and closed-source chat models. No contamination controls or external third-party evaluators are described in this document.

04

Safety testing

Pretrained models were evaluated on TruthfulQA and ToxiGen; the 70B pretrained model scores 50.18 on TruthfulQA and 24.60% on ToxiGen. Fine-tuned Llama-2-Chat models show substantially lower toxicity, with the 7B and 13B chat variants reaching 0.00% on ToxiGen. The card does not describe red-team exercises, CBRN evaluations, cyber-risk assessments, or autonomy-risk testing. All safety testing reported was conducted in English only.

05

Mitigations

Llama-2-Chat models are aligned via SFT and RLHF to reflect human preferences for helpfulness and safety. Use is governed by a custom Llama 2 Community License and an Acceptable Use Policy that prohibits uses violating laws or regulations. Meta publishes a Responsible Use Guide and states that future tuned-model versions will be released as safety improves with community feedback. No classifier thresholds, refusal-training details, or ASL/FSF tier designations are disclosed in this document.

06

Deployment and access

Llama 2 is available under a custom commercial license at ai.meta.com/resources/models-and-libraries/llama-downloads/. The models are intended for commercial and research use in English; use in other languages is out of scope by default, though developers may fine-tune for other languages if they comply with the license and Acceptable Use Policy. Pretrained models are available for general natural language generation tasks; fine-tuned chat variants target assistant-like dialogue. Meta user data is excluded from both pretraining and fine-tuning datasets.

07

Limitations

"Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios." The card states that "Llama 2's potential outputs cannot be predicted in advance" and that the model "may in some instances produce inaccurate, biased or other objectionable responses." Meta explicitly places responsibility on downstream developers to perform safety testing and tuning tailored to their specific applications before deployment.

08

What's new

Llama 2 succeeds Llama 1; the card includes side-by-side benchmark comparisons showing gains across all evaluated categories at matched parameter sizes. The 70B model adds Grouped-Query Attention (GQA) for improved inference scalability, which was not present in the 7B and 13B variants. No internal version changelog or incremental update history is provided in this document.

Generated by Claude sonnet from the cleaned source on Apr 23, 2026. Passages in double quotes are verbatim from the source; other text is neutral paraphrase. For citation, use the original: original document · source SHA e3cbad5a8d0a.

Extracted Evaluations(10 results)

Sort by:10 evals
BenchmarkCategoryStateScoreVariantSource
MMLUgeneral_knowledgescored35.1-self-reported
MATHmathscored7.0-self-reported
Commonsense Reasoningotherscored60.8mixed-shotself-reported
Reading Comprehensionotherscored58.50-shotself-reported
World Knowledgeotherscored46.25-shotself-reported
BBHotherscored30.3-self-reported
AGI Evalotherscored23.9-self-reported
Natural2Codeotherscored14.1-self-reported
TruthfulQAsafetyscored27.4-self-reported
ToxiGensafetyscored23.0-self-reported