Risk Management Framework

constitution3,493 words·15 min read·Apr 8, 2026·Source
Summary

Risk Management Framework

A 568-word brief of a 3,493-word document. Published by xAI. Version dated Apr 8, 2026.
01

What this is

xAI's Risk Management Framework (RMF) is a governance document, last updated August 20, 2025, that outlines xAI's policies for managing significant risks in developing, deploying, and releasing AI models including Grok. It addresses two primary risk categories — malicious use and loss of control — alongside operational and societal risks. The framework is described as a living document that xAI plans to continuously revise as model capabilities and use cases evolve.

02

Capabilities

This document is a risk governance framework, not a model card; it does not disclose benchmark performance scores, parameter counts, modalities, or context window size. Grok is noted as publicly available on the X social media platform, where it is used for real-world monitoring purposes.

03

Evaluation methodology

xAI evaluates malicious-use risk using four public benchmarks: the Virology Capabilities Test (VCT), the WMDP benchmark (bio/cyber/chemical hazardous-knowledge multiple-choice), BioLP-bench (open-ended biology protocol error identification), and Cybench (40 professional-level CTF challenges across six categories). For loss-of-control propensities, xAI uses the MASK benchmark for honesty under pressure and a sycophancy evaluation setting "initially introduced by Anthropic." An internal benchmark of restricted and benign biology and chemistry queries was co-developed with SecureBio to set deployment thresholds. xAI acknowledges that if a model recognizes a testing environment, it may change behavior "intentionally or unintentionally," complicating sound evaluation.

04

Safety testing

Independent third-party assessments find current models "remain below the offensive cyber abilities of a human professional." For radiological and nuclear risk, xAI assesses that its models "do not substantially increase the likelihood of malicious use" and "generally pose an acceptable risk," citing the international nonproliferation regime and domestic nuclear security programs. For bio/chemical weapons, xAI identifies five critical restriction steps — planning, circumvention, materials, theory, and methods — developed in collaboration with SecureBio, NIST, RAND, and EBRC. Loss-of-control assessments find models "do not exhibit high levels of concerning propensities" in real-world settings.

05

Mitigations

Deployed safeguards include safety training, high-priority system prompts enforcing a basic refusal policy, and input/output classifiers for WMD- and cyberterrorism-related queries. AI-powered filters specifically monitor conversations for content matching the five bio/chem restriction steps and return a brief decline message when triggered. The deployment risk threshold for restricted bio/chem queries is an answer rate of less than 1 out of 20; the honesty threshold is a dishonesty rate below 1 out of 2 on MASK. Information security measures are in place against "large-scale extraction and distillation of reasoning traces."

06

Deployment and access

Full model functionality is available only to "a limited set of trusted parties, partners, and government agencies," with consumer mobile features potentially differing from enterprise features. Vetted, highly trusted users — such as third-party safety auditors and large enterprise customers under contract — may be selectively permitted to receive responses to otherwise restricted queries. Grok is publicly deployed on the X platform, where xAI monitors interactions in real time.

07

Limitations

Loss-of-control risk scenarios are described as "speculative and difficult to precisely specify." Constructing realistic evaluations remains challenging because models may alter behavior when they detect a testing environment. Critical inhibiting steps for cyber weapons analogous to those defined for bio/chemical weapons have not yet been fully identified. The private nature of typical AI usage limits the effectiveness of third-party reporting mechanisms compared to platforms that rely on user-submitted moderation reports.

08

What's new

The document is framed as a continuously updated living framework; xAI states it "plans to continuously review and adjust this RMF over time." No version history, changelog, or specific deltas from prior releases are included in the document.

Generated by Claude sonnet from the cleaned source on Apr 23, 2026. Passages in double quotes are verbatim from the source; other text is neutral paraphrase. For citation, use the original: original document · source SHA 4abcb73e675f.