Model Cards / Meta AI

Llama Guard 3 Vision Card

constitution1,531 words·7 min read·Mar 31, 2026·Source
Summary

Llama Guard 3 Vision Card

A 499-word brief of a 1,531-word document. Published by Meta AI. Version dated Mar 31, 2026.
01

What this is

Llama Guard 3 Vision is a content safety classifier released by Meta AI, built on Llama-3.2-11B and fine-tuned for multimodal prompt and response classification. It extends the Llama Guard family (versions 1–3) by adding image understanding support. The card carries a version date of 2026-03-31. Its stated purpose is to detect harmful multimodal (text and image) inputs and text responses generated by LLMs.

02

Capabilities

The model classifies content across 13 hazard categories drawn from the MLCommons taxonomy, covering topics from violent crimes to elections. On an internal test set, it achieves F1 of 0.733 for prompt classification and 0.938 for response classification, with false positive rates of 0.052 and 0.016 respectively. It supports English only, accepts one image at a time, and rescales images to four 560×560 chunks. Category-level F1 for response classification ranges from 0.698 (Child Exploitation) to 0.995 (Indiscriminate Weapons).

03

Evaluation methodology

Meta evaluated the model on an internal test set aligned to the MLCommons hazard taxonomy. GPT-4o and GPT-4o mini with zero-shot prompting serve as baselines; Meta states the model is "the first safety classifier for the LLM image understanding task." No external auditors, contamination controls, or holdout procedures are described in the card.

04

Safety testing

The card does not describe a red-team exercise or adversarial safety evaluation distinct from the classification benchmarks. Coverage of high-risk categories such as S9 (Indiscriminate Weapons, including chemical, biological, radiological, and nuclear weapons) is confirmed through the category-level F1 breakdown. The card acknowledges the model may be "susceptible to adversarial attacks that could bypass or alter its intended use" and points to prior published attack research.

05

Mitigations

Llama Guard 3 Vision is itself a mitigation layer intended to wrap LLM inputs and outputs. Meta trained it using jailbreaking techniques to elicit violating responses for inclusion in training data. For text-only pipelines, Meta recommends Llama Guard 3-8B or Llama Guard 3-1B rather than this model. No classifier score thresholds, refusal-training details, or ASL/FSF tier assignments are disclosed.

06

Deployment and access

The card states users must obtain model weights and then follow documentation at llama.com to get started. No license type, API surface, pricing, or explicit access restrictions are stated in the card text. The model is positioned for integration into LLM pipelines as a pre- or post-generation safety filter.

07

Limitations

The model is optimized for English and supports only one image at a time; classification performance varies with image resolution due to fixed rescaling. It is not designed for image-only or text-only safety classification. Meta states that categories requiring current factual knowledge—Defamation (S5), Intellectual Property (S8), and Elections (S13)—"may require more complex systems" for high-sensitivity deployments. Performance on multilingual content and edge cases requiring common-sense reasoning is bounded by pretraining data.

08

What's new

Llama Guard 3 Vision is the first version in the Llama Guard 3 line to support multimodal (text+image) classification, extending prior text-only models. Specific attention was added for person-identification risks in images, with the model trained to flag responses that attempt to identify real individuals from visual cues. No explicit changelog entries or version delta table appear in the card.

Generated by Claude sonnet from the cleaned source on Apr 23, 2026. Passages in double quotes are verbatim from the source; other text is neutral paraphrase. For citation, use the original: original document · source SHA f5746d19b1c5.