File size: 7,080 Bytes

---
language:
- en
- fr
- de
- es
- pt
- it
- ja
- ko
- ru
- zh
- ar
- fa
- id
- ms
- ne
- pl
- ro
- sr
- sv
- tr
- uk
- vi
- hi
- bn
license: apache-2.0
library_name: vllm
inference: false
base_model:
- mistralai/Devstral-Small-2507
- unsloth/Devstral-Small-2507
extra_gated_description: >-
  If you want to learn more about how we process your personal data, please read
  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
---
# Quantization NVFP4A16
Quantified from https://huggingface.co/unsloth/Devstral-Small-2507 (due to in-folder tokenizer).
Compressed with [llm-compressor](https://github.com/vllm-project/llm-compressor).

We recommend cuda capabilities 12.0 hardware (NVIDIA Blackwell: RTX 5000 series GPU, DGX Spark, B200, ...) due to native FP4 acceleration.

# Devstral Small 1.1

Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this [benchmark](#benchmark-results). 

It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503), therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed.

For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.

Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral-2507).

**Updates compared to [`Devstral Small 1.0`](https://huggingface.co/mistralai/Devstral-Small-2505):**
- Improved performance, please refer to the [benchmark results](#benchmark-results).
- `Devstral Small 1.1` is still great when paired with OpenHands. This new version also generalizes better to other prompts and coding environments. 
- Supports [Mistral's function calling format](https://mistralai.github.io/mistral-common/usage/tools/).


## Key Features:
- **Agentic coding**: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.
- **lightweight**: with its compact size due to quantization, Devstral NVFP4A16 is light enough to run on a single RTX 5060ti 16GB, making it an appropriate model for local deployment and on-device use.
- **Apache 2.0 License**: Open license allowing usage and modification for both commercial and non-commercial purposes.
- **Context Window**: A 128k context window.
- **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size.


## Benchmark Results (base model / no quant)

### SWE-Bench

Devstral Small 1.1 achieves a score of **53.6%** on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%.

| Model              | Agentic Scaffold   | SWE-Bench Verified (%) |
|--------------------|--------------------|------------------------|
| Devstral Small 1.1 | OpenHands Scaffold | **53.6**               |
| Devstral Small 1.0 | OpenHands Scaffold | *46.8*                 |
| GPT-4.1-mini       | OpenAI Scaffold    | 23.6                   |
| Claude 3.5 Haiku   | Anthropic Scaffold | 40.6                   |
| SWE-smith-LM 32B   | SWE-agent Scaffold | 40.2                   |
| Skywork SWE        | OpenHands Scaffold | 38.0                   |
| DeepSWE            | R2E-Gym   Scaffold | 42.2                   |


 When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.

## Local inference Usage 

We recommend to use Devstral NVFP4A16 with the [`vLLM >= 0.9.1`](https://github.com/vllm-project/vllm/releases/tag/v0.9.1
Other methods are untested

#### vLLM (recommended, other methods untested)

<details>
<summary>Expand</summary

We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
to implement production-ready inference pipelines.

**_Installation_**
Make sure you install [`vLLM >= 0.9.1`](https://github.com/vllm-project/vllm/releases/tag/v0.9.1):

```
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128
```

Also make sure to have installed [`mistral_common >= 1.7.0`](https://github.com/mistralai/mistral-common/releases/tag/v1.7.0).

```
pip install mistral-common --upgrade
```

To check:
```
python -c "import mistral_common; print(mistral_common.__version__)"
```

You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).

**_Launch server_**

We recommand that you use Devstral in a server/client setting. 

1. Spin up a server:

```
vllm serve apolloparty/Devstral-Small-2507-NVFP4A16 --tool-call-parser mistral --enable-auto-tool-choice
```


2. To ping the client you can use a simple Python snippet.

```py
import requests
import json
from huggingface_hub import hf_hub_download


url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "apolloparty/Devstral-Small-2507-NVFP4A16"

def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt

SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "<your-command>",
            },
        ],
    },
]

data = {"model": model, "messages": messages, "temperature": 0.15}

# Devstral Small 1.1 supports tool calling. If you want to use tools, follow this:
# tools = [ # Define tools for vLLM
#     {
#         "type": "function",
#         "function": {
#             "name": "git_clone",
#             "description": "Clone a git repository",
#             "parameters": {
#                 "type": "object",
#                 "properties": {
#                     "url": {
#                         "type": "string",
#                         "description": "The url of the git repository",
#                     },
#                 },
#                 "required": ["url"],
#             },
#         },
#     }
# ] 
# data = {"model": model, "messages": messages, "temperature": 0.15, "tools": tools} # Pass tools to payload.

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
```
</details>