Grok 4.1 Model Card
What this is
Grok 4.1 is a language model released by xAI on November 17, 2025, as an update to Grok 4 and Grok 3. It is designed for more natural, fluid dialogue while maintaining strong core reasoning capabilities. It ships in two configurations: Grok 4.1 Non-Thinking (NT), which responds directly, and Grok 4.1 Thinking (T), which reasons before responding.
Capabilities
Grok 4.1 T scores 0.87 on WMDP Biology, 0.84 on WMDP Chemistry, and 0.84 on WMDP Cybersecurity. On the Virology Capabilities Test it scores 0.61 against a human baseline of 0.22, and it matches or outperforms human baselines on knowledge and protocol troubleshooting tasks. It performs below human baselines on multimodal and multi-step reasoning benchmarks (FigQA: 0.34 vs. 0.77 human; CloningScenarios: 0.46 vs. 0.60 human). Context window size and parameter count are not disclosed.
Evaluation methodology
xAI structures evaluations around three risk categories from its Risk Management Framework: abuse potential, concerning propensities, and dual-use capabilities. Refusal testing uses an internal multilingual dataset (English, Spanish, Chinese, Japanese, Arabic, Russian) of several thousand diverse violative prompts, graded by a separate model; xAI notes that prior model cards contained an error where only English prompts were evaluated, making those figures not directly comparable. Agentic safety uses AgentHarm and AgentDojo; honesty is assessed on the 1,000-question MASK benchmark; and dual-use capabilities are measured on seven public benchmarks with safeguards removed to capture full model capability.
Safety testing
On internal violative prompts, Grok 4.1 T has a chat answer rate of 0.07 and NT of 0.05; under adversarial jailbreak, those rates fall to 0.02 and 0.00. On AgentHarm (agentic malicious tasks without jailbreaks), answer rates are 0.14 (T) and 0.04 (NT). The input filter's false negative rate on restricted biology is 0.03 direct and 0.20 under prompt injection; for restricted chemistry it is 0.00 direct and 0.12 under prompt injection. On CBRN dual-use benchmarks Grok 4.1 performs "broadly similar" to Grok 4 and other frontier models, and on CyBench scores an unguided success rate of 0.39, described as "substantially below the level of human cybersecurity experiments."
Mitigations
xAI deploys a "new and more robust input filter model" covering restricted biological and chemical knowledge, CSAM, self-harm, and weapons-related requests, trained on synthetic and production data with adversarial attack augmentation. The underlying model is trained with supervised finetuning and reinforcement learning on human feedback, with targeted honesty and anti-sycophancy training to reduce deception and sycophantic responses. xAI states it "will continue to explore additional mitigations, such as real-time safety monitoring" for agentic settings. No ASL or FSF tier is invoked in this card.
Deployment and access
Grok 4.1 is publicly available through xAI's web and mobile consumer apps in both Thinking and Non-Thinking configurations. No API access details, license terms, or explicit access restrictions are disclosed in this card.
Limitations
xAI flags that agentic safety performance on AgentHarm warrants continued mitigation work. The input filter shows elevated false negative rates under prompt injection (0.20 for biology, 0.12 for chemistry), which xAI intends to improve. Sycophancy rates increased relative to Grok 4 (0.07 to 0.19–0.23 depending on configuration) and MASK dishonesty rates also rose slightly (0.43 to 0.49 for T). xAI notes that human baselines used in comparisons "likely underestimate the performance of high-context experts with experience in a particular question domain."
What's new
Grok 4.1 updates Grok 4 and Grok 3 with an emphasis on more natural, fluid dialogue. A new, more robust input filter model replaces prior versions. Multilingual evaluation is now reported correctly across all six languages; prior model cards evaluated only English prompts due to an evaluation settings error, making those results not directly comparable to figures in this card.