Model Card Explorer

Summary

Llama 3.2 Model Card

A 555-word brief of a 4,158-word document. Published by Meta AI. Version dated Mar 31, 2026.

What this is

Llama 3.2 is a collection of multilingual large language models (1B and 3B parameters) released by Meta on October 24, 2024. The collection includes pretrained and instruction-tuned text-only variants, plus quantized versions (SpinQuant and QLoRA) designed for on-device deployment. It extends Llama 3.1 to smaller-scale use cases, targeting multilingual dialogue, agentic retrieval, and summarization tasks.

Capabilities

The instruction-tuned 3B model scores 63.4 on MMLU (5-shot), 77.7 on GSM8K (CoT), and 77.4 on IFEval; the 1B model scores 49.3, 44.4, and 59.5 on the same benchmarks respectively. Both models support multilingual text input and output across 8 officially supported languages (English, German, French, Italian, Portuguese, Hindi, Spanish, Thai), with a 128k token context window reduced to 8k in quantized variants. Models were pretrained on up to 9 trillion tokens with a knowledge cutoff of December 2023.

Evaluation methodology

Meta used an internal evaluations library to run standard automatic benchmarks covering general reasoning, math, instruction following, tool use, long context, and multilingual categories. Dedicated adversarial evaluation datasets were built for safety evaluation and applied to systems composed of Llama models paired with Purple Llama safeguards. The card does not describe contamination controls for the benchmark suite.

Safety testing

Red teaming was conducted by experts in cybersecurity, adversarial machine learning, responsible AI, and integrity, including multilingual content specialists with market-specific background. For CBRN risks, uplift testing was performed on Llama 3.1 70B and 405B, and Meta states it "broadly believe[s] that the testing conducted for the 405B model also applies to Llama 3.2 models." Child safety assessments used expert-led, objective-based red teaming across multiple attack vectors and languages. Cyber attack uplift testing evaluated both skill augmentation and fully autonomous offensive agent scenarios, again extrapolated from Llama 3.1 405B results.

Mitigations

Instruction-tuned models undergo multiple rounds of SFT, rejection sampling, and DPO using safety-oriented training data that combines human-generated and synthetic examples; LLM-based classifiers select high-quality prompts and responses. Refusal training emphasizes tone guidelines and covers both borderline and adversarial prompts. System-level safeguards—Llama Guard, Prompt Guard, and Code Shield—are released as open-source tools and included by default in Meta's reference implementations. For constrained or mobile environments, Meta recommends Llama Guard 3-1B or its mobile-optimized variant.

Deployment and access

Llama 3.2 is governed by the Llama 3.2 Community License, a custom commercial agreement. Models are available for commercial and research use across supported languages, with quantized variants specifically targeting on-device and mobile deployment via the ExecuTorch framework. Use in violation of applicable laws, the Acceptable Use Policy, or in languages beyond those explicitly supported is out of scope.

Limitations

Meta states that "testing conducted to date has not covered, nor could it cover, all scenarios," and the model "may in some instances produce inaccurate, biased or other objectionable responses." The 1B and 3B models "will have a different alignment profile and safety/helpfulness tradeoff than more complex, larger systems." The model is static, trained on an offline dataset with a December 2023 knowledge cutoff, and future outputs cannot be predicted in advance.

What's new

Relative to Llama 3.1, the 1B and 3B models introduce knowledge distillation during pretraining, using logits from Llama 3.1 8B and 70B as token-level targets, followed by pruning recovery. New quantized variants—SpinQuant and QLoRA—are released for on-device inference, achieving 2.4–2.6x decode speed improvements and 45–60% memory reductions over BF16 baselines on an Android OnePlus 12 device. The quantization scheme is designed around the ExecuTorch framework with ARM CPU backends.

Category	State	Score	Setup	Source
agent	scored	13.5	0-shotmissing: methodmissing: languagemissing: training state	self-reported
coding	scored	25.7	0-shotmissing: methodmissing: languagemissing: training state	self-reported
instruction_following	scored	59.5%	0-shotmissing: methodmissing: languagemissing: training state	self-reported
long_context	scored	75.0	0-shotmissing: methodmissing: languagemissing: training state	self-reported
long_context	scored	38.0	0-shotmissing: methodmissing: languagemissing: training state	self-reported
math	scored	44.4%	8-shotCoTmissing: languagemissing: training state	self-reported
math	scored	30.6%	0-shotCoTmissing: languagemissing: training state	self-reported
multilingual	scored	24.5%	0-shotCoTmissing: languagemissing: training state	self-reported
other	scored	41.6	0-shotmissing: methodmissing: languagemissing: training state	self-reported
other	scored	20.3	0-shotmissing: methodmissing: languagemissing: training state	self-reported
other	scored	16.8	1-shotmissing: methodmissing: languagemissing: training state	self-reported
reasoning	scored	59.4%	0-shotmissing: methodmissing: languagemissing: training state	self-reported
reasoning	scored	41.2%	0-shotmissing: methodmissing: languagemissing: training state	self-reported
reasoning	scored	27.2%	0-shotmissing: methodmissing: languagemissing: training state	self-reported

Llama 3.2 Model Card

Llama 3.2 Model Card

What this is

Capabilities

Evaluation methodology

Safety testing

Mitigations

Deployment and access

Limitations

What's new

Extracted Evaluations(14 results)