Model Spec
What this is
The OpenAI Model Spec is a first draft behavioral specification document published May 08, 2024, covering models deployed in the OpenAI API and ChatGPT. It defines desired assistant behavior through three instrument types—objectives, rules, and defaults—and is intended to guide researchers and data labelers creating RLHF training data. OpenAI states it has not yet used this version in training, though parts derive from prior internal RLHF documentation. A newer version supersedes it; this version is preserved for historical reference.
Capabilities
The document does not discuss benchmark scores, parameter counts, context window size, or modalities. It is a behavioral policy specification, not a model card describing technical performance.
Evaluation methodology
The document does not describe evaluation methodology, red-teaming protocols, or contamination controls. It frames itself as guidelines for RLHF data creation rather than a record of empirical testing.
Safety testing
The card does not report safety evaluations, capability elicitation trials, or red-team findings. Safety is addressed through prescriptive rules rather than empirical testing results.
Mitigations
Hard rules prohibit assisting with CBRN threat creation, enabling or encouraging self-harm, reproducing content that infringes intellectual property, serving NSFW content by default, and disclosing private personal information. A message-role trust hierarchy (Platform > Developer > User > Tool) structures which instructions the model must follow and which it may override. Quoted or structured data in any message is treated as untrusted by default to mitigate prompt injection. NSFW content generation via API is described as under exploration for age-appropriate contexts, with no ASL or safety-level tier invoked in this document.
Deployment and access
The spec governs models available through the OpenAI API and the ChatGPT product. Compliance with OpenAI's Usage Policies is required of all developers and users; the spec explicitly defers account-level enforcement actions to those policies. Developers are granted substantial customization authority over assistant behavior within the bounds the spec establishes.
Limitations
OpenAI acknowledges the objectivity principle is "the most contentious and challenging to implement" because different parties disagree about what is objective or true. The document notes tension between not reinforcing misinformation and not attempting to change users' beliefs, flagging this as an open question. Refusal phrasing is described as unsatisfying: "we're not thrilled" with current 'can't' framing but have not resolved it. The spec is explicitly partial—it is not exhaustive and will be updated as feedback is gathered.
What's new
This is the initial public release of the Model Spec, dated May 08, 2024, described as a first draft. A newer version has since been published at model-spec.openai.com; a full changelog is maintained at the OpenAI model_spec GitHub repository.