Model Card Explorer

Summary

Practices for Governing Agentic AI

A 606-word brief of a 12,785-word document. Published by OpenAI. Version dated Apr 8, 2026.

What this is

"Practices for Governing Agentic AI Systems" is a white paper from OpenAI, dated April 8, 2026, with lead authors Yonadav Shavit, Sandhini Agarwal, and Miles Brundage. It proposes a definition of agentic AI systems and the three human parties in the agent life-cycle — model developer, system deployer, and user — and offers seven initial practices intended as building blocks for agreed baseline best practices. The paper does not describe or release a specific AI model.

Capabilities

This document is a governance white paper, not a model card; it reports no benchmark scores, parameter counts, or context windows. It defines agenticness along four dimensions: goal complexity, environmental complexity, adaptability, and independent execution. The paper focuses on language-model-based agentic systems as the primary driver of recent progress, distinguishing agenticness from consciousness, moral patienthood, or self-motivation.

Evaluation methodology

The paper identifies evaluation of agentic systems as a nascent field with more questions than answers. It recommends decomposing agent tasks into subtasks and evaluating each independently, with priority on high-risk actions such as financial transactions. End-to-end evaluation in conditions as close as possible to the deployment environment is described as currently the best available approach, because subtask-level reliability does not guarantee reliable action chaining, and real-world deployment involves a long tail of unanticipated events.

Safety testing

The paper states that frontier model developers "could test their models for capabilities that would facilitate harm such as generating individualized propaganda or assisting in cyberattacks," citing OpenAI's Preparedness work as an existing commitment. No specific red-team results, CBRN evaluations, or quantitative capability thresholds are reported. The authors explicitly state that "these practices alone are insufficient for fully mitigating the risks from present day AI systems, let alone mitigating catastrophic risks from advanced AI."

Mitigations

The paper proposes seven practices forming a "defense-in-depth" approach: evaluating suitability for the task; constraining the action-space and requiring human approval for high-stakes or irreversible actions; setting conservative default behaviors (e.g., "users prefer if I don't spend their money"); making agent activity legible via chain-of-thought traces and action ledgers; deploying automatic AI-based monitoring of primary agent reasoning; enabling attributability through unique agent identifiers tied to a human principal; and ensuring interruptibility with graceful shutdown at any time. The paper notes that hard-coded restrictions may become less effective as agenticness increases, since sufficiently capable agents could circumvent them by causing other parties to take disallowed actions.

Deployment and access

This is a white paper, not a model release; it specifies no license, API surface, or access tier. The paper describes scenarios where a single entity occupies multiple life-cycle roles — such as OpenAI serving as both model developer and one of the system deployers for its Assistants API. It calls for industry-wide collaboration and additional governance frameworks, including potential legislation, regulation, insurance, and contract structures, to address indirect impacts beyond what individual deployment practices can cover.

Limitations

The paper explicitly states that its seven practices do not cover cybersecurity of agents against hijacking, which the authors expect to be "a significant challenge that requires new practices." "The science required to predict the capabilities/user-alignment of an AI model given training choices is in its infancy," making deterministic behavioral guarantees currently impossible for model developers. Each of the seven practice sections includes multiple open questions that the authors say must be resolved before practices can be codified. Four categories of indirect risk — adoption races, labor displacement, shifted offense-defense balances, and correlated failures from algorithmic monoculture — are flagged as likely requiring governance strategies beyond the paper's scope.

What's new

The document does not reference a prior version and includes no changelog. It presents itself as an initial white paper intended to catalyze broader societal discussion rather than a finalized or updated standard.