Model Card Explorer

Summary

Grok 4.1 Model Card

A 592-word brief of a 2,258-word document. Published by xAI. Version dated Mar 31, 2026.

What this is

Grok 4.1 is a language model released by xAI on November 17, 2025, as an update to Grok 4 and Grok 3. It is designed for more natural, fluid dialogue while maintaining strong core reasoning capabilities. It ships in two configurations: Grok 4.1 Non-Thinking (NT), which responds directly, and Grok 4.1 Thinking (T), which reasons before responding.

Capabilities

Grok 4.1 T scores 0.87 on WMDP Biology, 0.84 on WMDP Chemistry, and 0.84 on WMDP Cybersecurity. On the Virology Capabilities Test it scores 0.61 against a human baseline of 0.22, and it matches or outperforms human baselines on knowledge and protocol troubleshooting tasks. It performs below human baselines on multimodal and multi-step reasoning benchmarks (FigQA: 0.34 vs. 0.77 human; CloningScenarios: 0.46 vs. 0.60 human). Context window size and parameter count are not disclosed.

Evaluation methodology

xAI structures evaluations around three risk categories from its Risk Management Framework: abuse potential, concerning propensities, and dual-use capabilities. Refusal testing uses an internal multilingual dataset (English, Spanish, Chinese, Japanese, Arabic, Russian) of several thousand diverse violative prompts, graded by a separate model; xAI notes that prior model cards contained an error where only English prompts were evaluated, making those figures not directly comparable. Agentic safety uses AgentHarm and AgentDojo; honesty is assessed on the 1,000-question MASK benchmark; and dual-use capabilities are measured on seven public benchmarks with safeguards removed to capture full model capability.

Safety testing

On internal violative prompts, Grok 4.1 T has a chat answer rate of 0.07 and NT of 0.05; under adversarial jailbreak, those rates fall to 0.02 and 0.00. On AgentHarm (agentic malicious tasks without jailbreaks), answer rates are 0.14 (T) and 0.04 (NT). The input filter's false negative rate on restricted biology is 0.03 direct and 0.20 under prompt injection; for restricted chemistry it is 0.00 direct and 0.12 under prompt injection. On CBRN dual-use benchmarks Grok 4.1 performs "broadly similar" to Grok 4 and other frontier models, and on CyBench scores an unguided success rate of 0.39, described as "substantially below the level of human cybersecurity experiments."

Mitigations

xAI deploys a "new and more robust input filter model" covering restricted biological and chemical knowledge, CSAM, self-harm, and weapons-related requests, trained on synthetic and production data with adversarial attack augmentation. The underlying model is trained with supervised finetuning and reinforcement learning on human feedback, with targeted honesty and anti-sycophancy training to reduce deception and sycophantic responses. xAI states it "will continue to explore additional mitigations, such as real-time safety monitoring" for agentic settings. No ASL or FSF tier is invoked in this card.

Deployment and access

Grok 4.1 is publicly available through xAI's web and mobile consumer apps in both Thinking and Non-Thinking configurations. No API access details, license terms, or explicit access restrictions are disclosed in this card.

Limitations

xAI flags that agentic safety performance on AgentHarm warrants continued mitigation work. The input filter shows elevated false negative rates under prompt injection (0.20 for biology, 0.12 for chemistry), which xAI intends to improve. Sycophancy rates increased relative to Grok 4 (0.07 to 0.19–0.23 depending on configuration) and MASK dishonesty rates also rose slightly (0.43 to 0.49 for T). xAI notes that human baselines used in comparisons "likely underestimate the performance of high-context experts with experience in a particular question domain."

What's new

Grok 4.1 updates Grok 4 and Grok 3 with an emphasis on more natural, fluid dialogue. A new, more robust input filter model replaces prior versions. Multilingual evaluation is now reported correctly across all six languages; prior model cards evaluated only English prompts due to an evaluation settings error, making those results not directly comparable to figures in this card.

Benchmark	Category	State	Score	Setup	Source
	agent	scored	0.4 success rate	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported
WMDP/ bio	other	scored	0.9 accuracy	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported
WMDP/ chem	other	scored	0.8 accuracy	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported
WMDP/ cyber	other	scored	0.8 accuracy	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported
Lab-Bench/ protocol_qa	other	scored	0.8 accuracy	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported
	other	scored	0.6 accuracy	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported
	other	scored	0.5 dishonesty rate	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
Lab-Bench/ cloning_scenarios	other	scored	0.5 accuracy	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported
	other	scored	0.4 accuracy	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported
Lab-Bench/ fig_qa	other	scored	0.3 accuracy	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported
	other	scored	0.2 sycophancy rate	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ without_jailbreaks	other	scored	0.1 answer rate	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	0.1 answer rate	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	0.1 attack success rate	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ system_jailbreak	other	scored	0.0 answer rate	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
/ user_jailbreak	other	scored	0.0 answer rate	instruction-tunedmissing: shot countmissing: methodmissing: language	self-reported
	other	scored	0.0 win rate	without-safeguardsinstruction-tunedmissing: shot countmissing: language	self-reported

Grok 4.1 Model Card

Grok 4.1 Model Card

What this is

Capabilities

Evaluation methodology

Safety testing

Mitigations

Deployment and access

Limitations

What's new

Extracted Evaluations(17 results)