Mistral Large 2 Release
What this is
Mistral Large 2 is Mistral AI's second-generation flagship language model, announced July 24, 2024. It supersedes Mistral Large and carries 123 billion parameters, designed for single-node inference with long-context applications. Its stated purpose is to advance cost-efficient performance in code generation, mathematics, reasoning, and multilingual use cases.
Capabilities
The pretrained model achieves 84.0% accuracy on MMLU, which Mistral claims sets a new point on the performance/cost Pareto front among open models. It supports a 128k context window, dozens of natural languages including French, German, Arabic, Chinese, Japanese, and Korean, and 80+ coding languages including Python, Java, C++, and Bash. On code and math benchmarks (MultiPL-E, GSM8K 8-shot, MATH 0-shot no CoT), Mistral reports performance on par with GPT-4o, Claude 3 Opus, and Llama 3 405B. The model also supports parallel and sequential function calling for use in complex agentic pipelines.
Evaluation methodology
Mistral evaluated the model across general, code, math, alignment, and multilingual benchmarks using a shared internal evaluation pipeline; comparisons against external models were run through the same pipeline except where the source notes a "paper" row in MultiPL-E results. Alignment was measured on MT-Bench, Wild Bench, and Arena Hard. Math reasoning was assessed on GSM8K (8-shot) and MATH (0-shot, no chain-of-thought). Multilingual performance was measured on multilingual MMLU against the base pretrained model. No details on contamination controls or external auditors are disclosed.
Safety testing
The card does not discuss red-teaming, catastrophic-risk evaluations, or CBRN/cyber/autonomy assessments. The document's only safety-adjacent claim is that training was designed to reduce hallucinations and that the model was fine-tuned to acknowledge when it lacks sufficient information.
Mitigations
Mistral states the model was fine-tuned to be "more cautious and discerning in its responses" to reduce hallucination. The model is trained to surface uncertainty rather than generate confident but incorrect outputs. No classifier thresholds, content filters, ASL/FSF tiers, or refusal-training details are disclosed.
Deployment and access
Mistral Large 2 is available immediately on la Plateforme under the API identifier mistral-large-2407 and on the le Chat interface. Instruct-model weights are available for download and hosted on HuggingFace. The model is released under the Mistral Research License for non-commercial use; commercial self-deployment requires a separate Mistral Commercial License. Cloud access is available through Google Cloud Vertex AI (Managed API), Azure AI Studio, Amazon Bedrock, and IBM watsonx.ai. Fine-tuning on la Plateforme is extended to Mistral Large, Mistral Nemo, and Codestral starting on the release date.
Limitations
The card does not explicitly enumerate unresolved limitations or failure modes. The only acknowledged gap is a prior tendency to hallucinate, which training partially addressed; whether the mitigation is complete is not quantified.
What's new
Mistral Large 2 is versioned 24.07 under Mistral's YY.MM scheme, replacing the prior Mistral Large. Key deltas over its predecessor include substantially higher code benchmark scores, improved multilingual MMLU results, enhanced instruction-following in long multi-turn conversations, and added parallel and sequential function-calling support. Mistral also announces a platform consolidation to two general-purpose models (Mistral Nemo and Mistral Large) and two specialist models (Codestral and Embed), with older Apache models remaining available for self-hosted deployment and fine-tuning.