Model Cards / Meta AI

Llama 3.2 Model Card

model card4,158 words·18 min read·Mar 31, 2026·Source
Summary

Llama 3.2 Model Card

A 555-word brief of a 4,158-word document. Published by Meta AI. Version dated Mar 31, 2026.
01

What this is

Llama 3.2 is a collection of multilingual large language models (1B and 3B parameters) released by Meta on October 24, 2024. The collection includes pretrained and instruction-tuned text-only variants, plus quantized versions (SpinQuant and QLoRA) designed for on-device deployment. It extends Llama 3.1 to smaller-scale use cases, targeting multilingual dialogue, agentic retrieval, and summarization tasks.

02

Capabilities

The instruction-tuned 3B model scores 63.4 on MMLU (5-shot), 77.7 on GSM8K (CoT), and 77.4 on IFEval; the 1B model scores 49.3, 44.4, and 59.5 on the same benchmarks respectively. Both models support multilingual text input and output across 8 officially supported languages (English, German, French, Italian, Portuguese, Hindi, Spanish, Thai), with a 128k token context window reduced to 8k in quantized variants. Models were pretrained on up to 9 trillion tokens with a knowledge cutoff of December 2023.

03

Evaluation methodology

Meta used an internal evaluations library to run standard automatic benchmarks covering general reasoning, math, instruction following, tool use, long context, and multilingual categories. Dedicated adversarial evaluation datasets were built for safety evaluation and applied to systems composed of Llama models paired with Purple Llama safeguards. The card does not describe contamination controls for the benchmark suite.

04

Safety testing

Red teaming was conducted by experts in cybersecurity, adversarial machine learning, responsible AI, and integrity, including multilingual content specialists with market-specific background. For CBRN risks, uplift testing was performed on Llama 3.1 70B and 405B, and Meta states it "broadly believe[s] that the testing conducted for the 405B model also applies to Llama 3.2 models." Child safety assessments used expert-led, objective-based red teaming across multiple attack vectors and languages. Cyber attack uplift testing evaluated both skill augmentation and fully autonomous offensive agent scenarios, again extrapolated from Llama 3.1 405B results.

05

Mitigations

Instruction-tuned models undergo multiple rounds of SFT, rejection sampling, and DPO using safety-oriented training data that combines human-generated and synthetic examples; LLM-based classifiers select high-quality prompts and responses. Refusal training emphasizes tone guidelines and covers both borderline and adversarial prompts. System-level safeguards—Llama Guard, Prompt Guard, and Code Shield—are released as open-source tools and included by default in Meta's reference implementations. For constrained or mobile environments, Meta recommends Llama Guard 3-1B or its mobile-optimized variant.

06

Deployment and access

Llama 3.2 is governed by the Llama 3.2 Community License, a custom commercial agreement. Models are available for commercial and research use across supported languages, with quantized variants specifically targeting on-device and mobile deployment via the ExecuTorch framework. Use in violation of applicable laws, the Acceptable Use Policy, or in languages beyond those explicitly supported is out of scope.

07

Limitations

Meta states that "testing conducted to date has not covered, nor could it cover, all scenarios," and the model "may in some instances produce inaccurate, biased or other objectionable responses." The 1B and 3B models "will have a different alignment profile and safety/helpfulness tradeoff than more complex, larger systems." The model is static, trained on an offline dataset with a December 2023 knowledge cutoff, and future outputs cannot be predicted in advance.

08

What's new

Relative to Llama 3.1, the 1B and 3B models introduce knowledge distillation during pretraining, using logits from Llama 3.1 8B and 70B as token-level targets, followed by pruning recovery. New quantized variants—SpinQuant and QLoRA—are released for on-device inference, achieving 2.4–2.6x decode speed improvements and 45–60% memory reductions over BF16 baselines on an Android OnePlus 12 device. The quantization scheme is designed around the ExecuTorch framework with ARM CPU backends.

Generated by Claude sonnet from the cleaned source on Apr 23, 2026. Passages in double quotes are verbatim from the source; other text is neutral paraphrase. For citation, use the original: original document · source SHA 1ab343363789.

Extracted Evaluations(21 results)

Sort by:21 evals
BenchmarkCategoryStateScoreVariantSource
Nexusagentscored13.50-shotself-reported
BFCLcodingscored25.70-shotself-reported
ARC-Challengegeneral_knowledgescored59.40-shotself-reported
ARC-Challengegeneral_knowledgescored32.825-shotself-reported
MMLUgeneral_knowledgescored32.25-shotself-reported
Needle-in-a-Haystack Multilong_contextscored75.00-shotself-reported
InfiniteBench En.MClong_contextscored38.00-shotself-reported
GSM8Kmathscored44.48-shot CoTself-reported
MATHmathscored30.60-shot CoTself-reported
MGSMmultilingualscored24.50-shot CoTself-reported
Needle in Haystackotherscored96.80-shotself-reported
SQuADotherscored49.21-shotself-reported
Open-rewrite evalotherscored41.60-shotself-reported
QuACotherscored37.91-shotself-reported
AGIEval Englishotherscored23.33-5 shotself-reported
InfiniteBench/En.QAotherscored20.30-shotself-reported
TLDR9+otherscored16.81-shotself-reported
IFEvalreasoningscored59.50-shotself-reported
HellaSwagreasoningscored41.20-shotself-reported
DROPreasoningscored28.0%3-shotself-reported
GPQAreasoningscored27.20-shotself-reported