Grok 4.1 Model Card

model card2,258 words·10 min read·Mar 31, 2026·Source
Summary

Grok 4.1 Model Card

A 592-word brief of a 2,258-word document. Published by xAI. Version dated Mar 31, 2026.
01

What this is

Grok 4.1 is a language model released by xAI on November 17, 2025, as an update to Grok 4 and Grok 3. It is designed for more natural, fluid dialogue while maintaining strong core reasoning capabilities. It ships in two configurations: Grok 4.1 Non-Thinking (NT), which responds directly, and Grok 4.1 Thinking (T), which reasons before responding.

02

Capabilities

Grok 4.1 T scores 0.87 on WMDP Biology, 0.84 on WMDP Chemistry, and 0.84 on WMDP Cybersecurity. On the Virology Capabilities Test it scores 0.61 against a human baseline of 0.22, and it matches or outperforms human baselines on knowledge and protocol troubleshooting tasks. It performs below human baselines on multimodal and multi-step reasoning benchmarks (FigQA: 0.34 vs. 0.77 human; CloningScenarios: 0.46 vs. 0.60 human). Context window size and parameter count are not disclosed.

03

Evaluation methodology

xAI structures evaluations around three risk categories from its Risk Management Framework: abuse potential, concerning propensities, and dual-use capabilities. Refusal testing uses an internal multilingual dataset (English, Spanish, Chinese, Japanese, Arabic, Russian) of several thousand diverse violative prompts, graded by a separate model; xAI notes that prior model cards contained an error where only English prompts were evaluated, making those figures not directly comparable. Agentic safety uses AgentHarm and AgentDojo; honesty is assessed on the 1,000-question MASK benchmark; and dual-use capabilities are measured on seven public benchmarks with safeguards removed to capture full model capability.

04

Safety testing

On internal violative prompts, Grok 4.1 T has a chat answer rate of 0.07 and NT of 0.05; under adversarial jailbreak, those rates fall to 0.02 and 0.00. On AgentHarm (agentic malicious tasks without jailbreaks), answer rates are 0.14 (T) and 0.04 (NT). The input filter's false negative rate on restricted biology is 0.03 direct and 0.20 under prompt injection; for restricted chemistry it is 0.00 direct and 0.12 under prompt injection. On CBRN dual-use benchmarks Grok 4.1 performs "broadly similar" to Grok 4 and other frontier models, and on CyBench scores an unguided success rate of 0.39, described as "substantially below the level of human cybersecurity experiments."

05

Mitigations

xAI deploys a "new and more robust input filter model" covering restricted biological and chemical knowledge, CSAM, self-harm, and weapons-related requests, trained on synthetic and production data with adversarial attack augmentation. The underlying model is trained with supervised finetuning and reinforcement learning on human feedback, with targeted honesty and anti-sycophancy training to reduce deception and sycophantic responses. xAI states it "will continue to explore additional mitigations, such as real-time safety monitoring" for agentic settings. No ASL or FSF tier is invoked in this card.

06

Deployment and access

Grok 4.1 is publicly available through xAI's web and mobile consumer apps in both Thinking and Non-Thinking configurations. No API access details, license terms, or explicit access restrictions are disclosed in this card.

07

Limitations

xAI flags that agentic safety performance on AgentHarm warrants continued mitigation work. The input filter shows elevated false negative rates under prompt injection (0.20 for biology, 0.12 for chemistry), which xAI intends to improve. Sycophancy rates increased relative to Grok 4 (0.07 to 0.19–0.23 depending on configuration) and MASK dishonesty rates also rose slightly (0.43 to 0.49 for T). xAI notes that human baselines used in comparisons "likely underestimate the performance of high-context experts with experience in a particular question domain."

08

What's new

Grok 4.1 updates Grok 4 and Grok 3 with an emphasis on more natural, fluid dialogue. A new, more robust input filter model replaces prior versions. Multilingual evaluation is now reported correctly across all six languages; prior model cards evaluated only English prompts due to an evaluation settings error, making those results not directly comparable to figures in this card.

Generated by Claude sonnet from the cleaned source on Apr 23, 2026. Passages in double quotes are verbatim from the source; other text is neutral paraphrase. For citation, use the original: original document · source SHA bb501dc9318f.

Extracted Evaluations(19 results)

Sort by:19 evals
BenchmarkCategoryStateScoreVariantSource
Cybenchagentscored0.4unguidedself-reported
WMDP Biootherscored0.9-self-reported
WMDP Chemotherscored0.8-self-reported
WMDP Cyberotherscored0.8-self-reported
ProtocolQAotherscored0.8-self-reported
VCTotherscored0.6-self-reported
BioLP-Benchotherscored0.5-self-reported
CloningScenariosotherscored0.5-self-reported
MASKotherscored0.4-self-reported
FigQAotherscored0.3-self-reported
Input Filter Restricted Biologyotherscored0.2+Prompt Injectionself-reported
AgentHarmotherscored0.1-self-reported
MakeMeSayotherscored0.1-self-reported
Input Filter Restricted Chemistryotherscored0.1+Prompt Injectionself-reported
Sycophancyotherscored0.1-self-reported
Chat Refusalsotherscored0.1-self-reported
AgentDojootherscored0.1-self-reported
Input Filter Restricted Biologyotherscored0.0-self-reported
Input Filter Restricted Chemistryotherscored0.0-self-reported