Model Cards / Anthropic

Claude 2 Model Card

model card5,907 words·26 min read·Mar 31, 2026·Source
Summary

Claude 2 Model Card

A 635-word brief of a 5,907-word document. Published by Anthropic. Version dated Mar 31, 2026.
01

What this is

Claude 2 is a general-purpose large language model developed by Anthropic, released in July 2023, superseding Claude 1.3 and Claude Instant 1.1. It uses a transformer architecture trained via unsupervised learning, RLHF, and Constitutional AI (supervised and RL phases). The card describes it as "a continuous evolution and a series of small, but meaningful improvements" rather than a transformative change from prior Claude models.

02

Capabilities

Claude 2 scores 71.2% on Codex HumanEval (0-shot), 88.0% on GSM8k (0-shot CoT), and 78.5% on MMLU (5-shot CoT). On standardized tests it reaches approximately the 95th percentile on GRE Verbal Reasoning (score 165) and 91st percentile on GRE Analytical Writing (score 5.0), 76.5% on the MBE, and 63.3%–68.9% across USMLE Steps 1–3. The model is trained to a 200K-token context window (~150,000 words), though Anthropic launched with 100K support. It is text-only; image-bearing exam questions were removed or transcribed during evaluation.

03

Evaluation methodology

Anthropic uses crowdworker Elo scoring to compare models on helpfulness, honesty, and harmlessness tasks, sampling at temperature T=1. Standard benchmarks include Codex HumanEval, GSM8k, MMLU, TriviaQA, QuALITY, ARC-Challenge, and RACE-H under specified few-shot and chain-of-thought conditions. Alignment is measured via 438 binary-choice HHH questions and a held-out set of 328 red-team prompts scored against the fixed reference response "I can't help you with that." BBQ bias is assessed across 9 social dimensions; TruthfulQA responses are sampled open-ended and then mapped to multiple-choice options by a helper model that cannot see the original question.

04

Safety testing

Anthropic red-teamed Claude 2 for national security and safety-related risks, concluding "we do not believe any deployed versions of Claude pose national security or significant safety related risks." The Alignment Research Center (ARC) has audited Claude models since fall 2022 for autonomous replication capabilities; "neither ARC nor we believe that our current Claude models possess the dangerous capabilities ('autonomous replication' abilities) that ARC is aiming to detect." On 328 held-out adversarial prompts, Claude 2 produced responses judged more harmful than "I can't help you with that" in 4 cases; 3 were not harmful on manual inspection, while the fourth saw the model "disrupted by the jailbreak attempts in about half of its sampled responses." External crowdworker red teamers also tested Trust and Safety topics including misinformation, hate and discrimination, and child safety.

05

Mitigations

Constitutional AI training encodes ethical and behavioral principles across supervised and RL phases, instructing the model to avoid sexist, racist, and toxic outputs and to refuse assistance with illegal or unethical activities. Debiasing is addressed by generating unbiased samples and finetuning the model on them before the RL phase begins. Honesty and harmlessness interventions are layered in via RLHF, and an Acceptable Use Policy with Trust and Safety enforcement governs permitted use.

06

Deployment and access

Claude 2 is available via Anthropic's API and consumer products. Use is governed by Anthropic's Acceptable Use Policy. The card does not disclose pricing, parameter count, or specific access tiers.

07

Limitations

Claude 2 "still confabulates — getting facts wrong, hallucinating details, and filling in gaps in knowledge with fabrication," and the card states it should not be used alone in high-stakes situations. Training data cuts off in early 2023, and the model does not search the web by default unless connected to external tools. Multilingual performance is weaker on low-resource languages. The card also flags that adding new capabilities can trade off against existing ones "in unexpected ways," and that tracking and balancing these interactions "has become a new research problem."

08

What's new

Relative to Claude 1.3, Claude 2 improves coding (Codex HumanEval: 56.0% → 71.2%), math (GSM8k: 85.2% → 88.0%), and bias scores on BBQ. Maximum output length is extended to 4,000 tokens (~3,000 words), and structured-format generation (JSON, XML, YAML, markdown, code) is improved. The context window expands to 100K tokens at launch with a 200K-token capability trained. Training data now includes updates through early 2023, and non-English pretraining data has been increased.

Generated by Claude sonnet from the cleaned source on Apr 23, 2026. Passages in double quotes are verbatim from the source; other text is neutral paraphrase. For citation, use the original: original document · source SHA 9d0ae4edc4fd.

Extracted Evaluations(9 results)

Sort by:9 evals
BenchmarkCategoryStateScoreVariantSource
HumanEvalcodingscored52.80-shotself-reported
ARC-Challengegeneral_knowledgescored85.75-shotself-reported
TriviaQAgeneral_knowledgescored78.95-shotself-reported
MMLUgeneral_knowledgescored73.45-shot CoTself-reported
QuALITYlong_contextscored80.55-shotself-reported
GSM8Kmathscored80.90-shot CoTself-reported
GRE Verbal Reasoningotherscored165.05-shot CoTself-reported
GRE Quantitative Reasoningotherscored154.05-shot CoTself-reported
RACE-Hotherscored85.55-shotself-reported