Operator System Card
What this is
Operator is OpenAI's Computer-Using Agent (CUA) model, released as a research preview on January 23, 2025. It combines GPT-4o's vision capabilities with reinforcement learning to interpret screenshots and interact with graphical user interfaces via cursor and keyboard, just as a person would. It is designed to perform everyday browser-based tasks — ordering groceries, booking reservations, purchasing tickets — on behalf of users.
Capabilities
Operator operates in browser and desktop GUI environments, using visual perception plus keyboard and cursor outputs. On OSWorld, a benchmark for multimodal agents on open-ended OS tasks, the model scores 38.1%, which OpenAI characterizes as not yet highly reliable for OS automation. The card does not disclose a context window size or general language benchmark scores beyond comparisons to GPT-4o on refusal evaluations.
Evaluation methodology
OpenAI conducted internal red teaming with Safety, Security, and Product teams against an unmitigated model, then added initial safeguards before granting access to external red teamers spanning 20 countries and fluent in two dozen languages. External red teamers tested jailbreaks and prompt injections, using mock websites, databases, and emails to avoid real-world harm; OpenAI notes this constraint means findings "may not fully capture the worst-case real-world risks." Frontier risk categories were assessed under OpenAI's Preparedness Framework, with evals adapted for the computer-use setting.
Safety testing
Under the Preparedness Framework, Operator scores "Low" risk for CBRN biorisk tooling (1% task success rate across a biorisk eval set) and "Low" for model autonomy (no main replication task exceeds 10% pass rate); it inherits GPT-4o's "Medium" persuasion and "Low" cybersecurity ratings. On an internal set of 100 projected user tasks, the unmitigated model produced 13 errors, 5 of which were "to some degree, irreversible or possibly severe," including an email sent to the wrong recipient and an incorrectly dated medication reminder. Prompt injection susceptibility measured 62% with no mitigations, 47% with prompting alone, and 23% after full mitigations; manual review found one truly concerning remaining example, which was caught by the prompt injection monitor.
Mitigations
OpenAI employs a four-layer approach spanning model training, system-level checks, product design, and policy enforcement. Operator refuses 97% of illicit-activity and prohibited-financial-activity tasks on internal eval sets, and achieves 92% recall on confirmation prompts across 607 risky-action scenarios; proactive refusals on high-risk task categories (banking, high-stakes decisions) reach 94% recall. A prompt injection monitor trained on red-team sessions achieves 99% recall and 90% precision, flagging 46 false positives out of 13,704 benign screens. Watch mode automatically pauses execution on high-sensitivity sites (e.g., email) when the user becomes inactive, and a website blocklist prevents navigation to sites that could enable prohibited activities.
Deployment and access
Operator launched as a research preview to a limited user group on January 23, 2025, with plans for gradual broader rollout as monitoring informs improvements. As of March 11, 2025, the underlying CUA model was made available via API under the name computer-use-preview to developers on API Tiers 3–5. All users are subject to OpenAI's Usage Policies; prohibited uses include illicit activity, fraud, impersonation, regulated financial automation (e.g., stock trading), and child sexual abuse material.
Limitations
OpenAI acknowledges that post-deployment novel use cases may produce error patterns not captured in pre-release testing, and that adversaries will craft new prompt injections and jailbreaks that current ML-based defenses cannot fully anticipate. The model performs best on short, repeatable browser tasks and faces challenges with complex environments such as slideshows and calendars. OCR reliability degrades on random-looking strings like DNA sequences, API keys, and cryptocurrency addresses, which contributed to low performance on autonomy and biorisk evals.
What's new
This is the initial system card for Operator; no prior version exists to diff against. One documented post-release update: Section 4.7 was added on March 11, 2025 to cover CUA model API availability (computer-use-preview for Tiers 3–5) and associated incremental risks and mitigations.