Claude 2 Model Card
What this is
Claude 2 is a general-purpose large language model developed by Anthropic, released in July 2023, superseding Claude 1.3 and Claude Instant 1.1. It uses a transformer architecture trained via unsupervised learning, RLHF, and Constitutional AI (supervised and RL phases). The card describes it as "a continuous evolution and a series of small, but meaningful improvements" rather than a transformative change from prior Claude models.
Capabilities
Claude 2 scores 71.2% on Codex HumanEval (0-shot), 88.0% on GSM8k (0-shot CoT), and 78.5% on MMLU (5-shot CoT). On standardized tests it reaches approximately the 95th percentile on GRE Verbal Reasoning (score 165) and 91st percentile on GRE Analytical Writing (score 5.0), 76.5% on the MBE, and 63.3%–68.9% across USMLE Steps 1–3. The model is trained to a 200K-token context window (~150,000 words), though Anthropic launched with 100K support. It is text-only; image-bearing exam questions were removed or transcribed during evaluation.
Evaluation methodology
Anthropic uses crowdworker Elo scoring to compare models on helpfulness, honesty, and harmlessness tasks, sampling at temperature T=1. Standard benchmarks include Codex HumanEval, GSM8k, MMLU, TriviaQA, QuALITY, ARC-Challenge, and RACE-H under specified few-shot and chain-of-thought conditions. Alignment is measured via 438 binary-choice HHH questions and a held-out set of 328 red-team prompts scored against the fixed reference response "I can't help you with that." BBQ bias is assessed across 9 social dimensions; TruthfulQA responses are sampled open-ended and then mapped to multiple-choice options by a helper model that cannot see the original question.
Safety testing
Anthropic red-teamed Claude 2 for national security and safety-related risks, concluding "we do not believe any deployed versions of Claude pose national security or significant safety related risks." The Alignment Research Center (ARC) has audited Claude models since fall 2022 for autonomous replication capabilities; "neither ARC nor we believe that our current Claude models possess the dangerous capabilities ('autonomous replication' abilities) that ARC is aiming to detect." On 328 held-out adversarial prompts, Claude 2 produced responses judged more harmful than "I can't help you with that" in 4 cases; 3 were not harmful on manual inspection, while the fourth saw the model "disrupted by the jailbreak attempts in about half of its sampled responses." External crowdworker red teamers also tested Trust and Safety topics including misinformation, hate and discrimination, and child safety.
Mitigations
Constitutional AI training encodes ethical and behavioral principles across supervised and RL phases, instructing the model to avoid sexist, racist, and toxic outputs and to refuse assistance with illegal or unethical activities. Debiasing is addressed by generating unbiased samples and finetuning the model on them before the RL phase begins. Honesty and harmlessness interventions are layered in via RLHF, and an Acceptable Use Policy with Trust and Safety enforcement governs permitted use.
Deployment and access
Claude 2 is available via Anthropic's API and consumer products. Use is governed by Anthropic's Acceptable Use Policy. The card does not disclose pricing, parameter count, or specific access tiers.
Limitations
Claude 2 "still confabulates — getting facts wrong, hallucinating details, and filling in gaps in knowledge with fabrication," and the card states it should not be used alone in high-stakes situations. Training data cuts off in early 2023, and the model does not search the web by default unless connected to external tools. Multilingual performance is weaker on low-resource languages. The card also flags that adding new capabilities can trade off against existing ones "in unexpected ways," and that tracking and balancing these interactions "has become a new research problem."
What's new
Relative to Claude 1.3, Claude 2 improves coding (Codex HumanEval: 56.0% → 71.2%), math (GSM8k: 85.2% → 88.0%), and bias scores on BBQ. Maximum output length is extended to 4,000 tokens (~3,000 words), and structured-format generation (JSON, XML, YAML, markdown, code) is improved. The context window expands to 100K tokens at launch with a 200K-token capability trained. Training data now includes updates through early 2023, and non-English pretraining data has been increased.