xAI API Documentation
What this is
This is xAI's Inference API reference documentation, published April 17, 2026 and last updated April 13, 2026. It covers three endpoint groups under the chat inference surface: chat completions, a Responses API (create, retrieve, delete), and deferred chat completions. The document is an API reference, not a model card; it describes endpoint contracts rather than a specific named model.
Capabilities
The /v1/chat/completions endpoint accepts text and image prompts and targets "chat and image understanding models." The Responses API supports tool calling via functions and web search, with a maximum of 128 tools per request and optional parallel tool call execution. No benchmark scores, parameter counts, or context window sizes are stated in this document.
Evaluation methodology
Not disclosed in this document.
Safety testing
Not disclosed in this document.
Mitigations
Not disclosed in this document.
Deployment and access
The API exposes four endpoints: POST /v1/chat/completions, POST /v1/responses, GET /v1/responses/{response_id}, and DELETE /v1/responses/{response_id}, plus GET /v1/chat/deferred-completion/{request_id} for polling asynchronous requests. Responses generated via the Responses API are stored for 30 days and then permanently deleted; the store parameter defaults to true. A service_tier field (default: default) controls the processing tier. No licensing terms, pricing, or access eligibility criteria are described.
Limitations
The document explicitly marks frequency_penalty and presence_penalty as "NOT SUPPORTED in Responses API," noting they exist only for compatibility. The background field is listed as "not used at the moment" and present solely for OpenResponses compatibility. The truncation strategy defaults to disabled with no further options described. No other limitations are stated.
What's new
The document carries a last-updated date of April 13, 2026. No changelog entries or version deltas are included in the source text.