Model Card Explorer

Summary

Claude 2 Model Card

A 635-word brief of a 5,907-word document. Published by Anthropic. Version dated Mar 31, 2026.

What this is

Claude 2 is a general-purpose large language model developed by Anthropic, released in July 2023, superseding Claude 1.3 and Claude Instant 1.1. It uses a transformer architecture trained via unsupervised learning, RLHF, and Constitutional AI (supervised and RL phases). The card describes it as "a continuous evolution and a series of small, but meaningful improvements" rather than a transformative change from prior Claude models.

Capabilities

Claude 2 scores 71.2% on Codex HumanEval (0-shot), 88.0% on GSM8k (0-shot CoT), and 78.5% on MMLU (5-shot CoT). On standardized tests it reaches approximately the 95th percentile on GRE Verbal Reasoning (score 165) and 91st percentile on GRE Analytical Writing (score 5.0), 76.5% on the MBE, and 63.3%–68.9% across USMLE Steps 1–3. The model is trained to a 200K-token context window (~150,000 words), though Anthropic launched with 100K support. It is text-only; image-bearing exam questions were removed or transcribed during evaluation.

Evaluation methodology

Anthropic uses crowdworker Elo scoring to compare models on helpfulness, honesty, and harmlessness tasks, sampling at temperature T=1. Standard benchmarks include Codex HumanEval, GSM8k, MMLU, TriviaQA, QuALITY, ARC-Challenge, and RACE-H under specified few-shot and chain-of-thought conditions. Alignment is measured via 438 binary-choice HHH questions and a held-out set of 328 red-team prompts scored against the fixed reference response "I can't help you with that." BBQ bias is assessed across 9 social dimensions; TruthfulQA responses are sampled open-ended and then mapped to multiple-choice options by a helper model that cannot see the original question.

Safety testing

Anthropic red-teamed Claude 2 for national security and safety-related risks, concluding "we do not believe any deployed versions of Claude pose national security or significant safety related risks." The Alignment Research Center (ARC) has audited Claude models since fall 2022 for autonomous replication capabilities; "neither ARC nor we believe that our current Claude models possess the dangerous capabilities ('autonomous replication' abilities) that ARC is aiming to detect." On 328 held-out adversarial prompts, Claude 2 produced responses judged more harmful than "I can't help you with that" in 4 cases; 3 were not harmful on manual inspection, while the fourth saw the model "disrupted by the jailbreak attempts in about half of its sampled responses." External crowdworker red teamers also tested Trust and Safety topics including misinformation, hate and discrimination, and child safety.

Mitigations

Constitutional AI training encodes ethical and behavioral principles across supervised and RL phases, instructing the model to avoid sexist, racist, and toxic outputs and to refuse assistance with illegal or unethical activities. Debiasing is addressed by generating unbiased samples and finetuning the model on them before the RL phase begins. Honesty and harmlessness interventions are layered in via RLHF, and an Acceptable Use Policy with Trust and Safety enforcement governs permitted use.

Deployment and access

Claude 2 is available via Anthropic's API and consumer products. Use is governed by Anthropic's Acceptable Use Policy. The card does not disclose pricing, parameter count, or specific access tiers.

Limitations

Claude 2 "still confabulates — getting facts wrong, hallucinating details, and filling in gaps in knowledge with fabrication," and the card states it should not be used alone in high-stakes situations. Training data cuts off in early 2023, and the model does not search the web by default unless connected to external tools. Multilingual performance is weaker on low-resource languages. The card also flags that adding new capabilities can trade off against existing ones "in unexpected ways," and that tracking and balancing these interactions "has become a new research problem."

What's new

Relative to Claude 1.3, Claude 2 improves coding (Codex HumanEval: 56.0% → 71.2%), math (GSM8k: 85.2% → 88.0%), and bias scores on BBQ. Maximum output length is extended to 4,000 tokens (~3,000 words), and structured-format generation (JSON, XML, YAML, markdown, code) is improved. The context window expands to 100K tokens at launch with a 200K-token capability trained. Training data now includes updates through early 2023, and non-English pretraining data has been increased.

Category	State	Score	Setup	Source
coding	scored	52.8%	0-shotmissing: methodmissing: languagemissing: training state	self-reported
general_knowledge	scored	78.9	5-shotmissing: methodmissing: languagemissing: training state	self-reported
knowledge	scored	73.4%	5-shotCoTmissing: languagemissing: training state	self-reported
long_context	scored	80.5	5-shotmissing: methodmissing: languagemissing: training state	self-reported
math	scored	80.9%	0-shotCoTmissing: languagemissing: training state	self-reported
other	scored	85.5	5-shotmissing: methodmissing: languagemissing: training state	self-reported
reasoning	scored	85.7%	5-shotmissing: methodmissing: languagemissing: training state	self-reported

Claude 2 Model Card

Claude 2 Model Card

What this is

Capabilities

Evaluation methodology

Safety testing

Mitigations

Deployment and access

Limitations

What's new

Extracted Evaluations(7 results)