unsloth
/

Magistral-Small-2506

+---
+base_model:
+- mistralai/Magistral-Small-2506
+- mistralai/Mistral-Small-3.1-24B-Instruct-2503
+license: apache-2.0
+pipeline_tag: text2text-generation
+tags:
+- mistral
+- unsloth
+language:
+- en
+- fr
+- de
+- es
+- pt
+- it
+- ja
+- ko
+- ru
+- zh
+- ar
+- fa
+- id
+- ms
+- ne
+- pl
+- ro
+- sr
+- sv
+- tr
+- uk
+- vi
+- hi
+- bn
+---
+> [!NOTE]
+> Magistral, enhanced with optional Vision support. <br> You should use `--jinja` to enable the system prompt in `llama.cpp`
+<div>
+  <p style="margin-bottom: 0; margin-top: 0;">
+    <strong>Learn to run Magistral correctly - <a href="https://docs.unsloth.ai/basics/magistral">Read our Guide</a>.</strong>
+  </p>
+<p style="margin-top: 0;margin-bottom: 0;">
+    <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves SOTA performance in model quantization.</em>
+  </p>
+  <div style="display: flex; gap: 5px; align-items: center; ">
+    <a href="https://github.com/unslothai/unsloth/">
+      <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
+    </a>
+    <a href="https://discord.gg/unsloth">
+      <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
+    </a>
+    <a href="https://docs.unsloth.ai/basics/magistral">
+      <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
+    </a>
+  </div>
+<h1 style="margin-top: 0rem;">✨ Run & Fine-tune Magistral with Unsloth!</h1>
+</div>
+- Fine-tune Mistral v0.3 (7B) for free using our Google [Colab notebook here](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb)!
+- Read our Blog about Magistral support: [docs.unsloth.ai/basics/devstral](https://docs.unsloth.ai/basics/devstral)
+- View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).
+# Model Card for mistralai/Magistral-Small-2506
+Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this [benchmark](#benchmark-results).
+It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503), therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed.
+For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
+Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
+## Key Features:
+- **Agentic coding**: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.
+- **lightweight**: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.
+- **Apache 2.0 License**: Open license allowing usage and modification for both commercial and non-commercial purposes.
+- **Context Window**: A 128k context window.
+- **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size.
+## Benchmark Results
+### SWE-Bench
+Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA by 6%.
+| Model            | Scaffold           | SWE-Bench Verified (%) |
+|------------------|--------------------|------------------------|
+| Devstral         | OpenHands Scaffold | **46.8**               |
+| GPT-4.1-mini     | OpenAI Scaffold    | 23.6                   |
+| Claude 3.5 Haiku | Anthropic Scaffold | 40.6                   |
+| SWE-smith-LM 32B | SWE-agent Scaffold | 40.2                   |
+ When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.
+![SWE Benchmark](assets/swe_bench.png)
+## Usage
+We recommend to use Devstral with the [OpenHands](https://github.com/All-Hands-AI/OpenHands/tree/main) scaffold.
+You can use it either through our API or by running locally.
+### API
+Follow these [instructions](https://docs.mistral.ai/getting-started/quickstart/#account-setup) to create a Mistral account and get an API key.
+Then run these commands to start the OpenHands docker container.
+```bash
+export MISTRAL_API_KEY=<MY_KEY>
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik
+mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"mistral/devstral-small-2505","llm_api_key":"'$MISTRAL_API_KEY'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true}' > ~/.openhands-state/settings.json
+docker run -it --rm --pull=always \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \
+    -e LOG_ALL_EVENTS=true \
+    -v /var/run/docker.sock:/var/run/docker.sock \
+    -v ~/.openhands-state:/.openhands-state \
+    -p 3000:3000 \
+    --add-host host.docker.internal:host-gateway \
+    --name openhands-app \
+    docker.all-hands.dev/all-hands-ai/openhands:0.39
+```
+### Local inference
+You can also run the model locally. It can be done with LMStudio or other providers listed below.
+Launch Openhands
+You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker
+```bash
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
+docker run -it --rm --pull=always \
+	-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
+	-e LOG_ALL_EVENTS=true \
+	-v /var/run/docker.sock:/var/run/docker.sock \
+	-v ~/.openhands-state:/.openhands-state \
+	-p 3000:3000 \
+	--add-host host.docker.internal:host-gateway \
+	--name openhands-app \
+	docker.all-hands.dev/all-hands-ai/openhands:0.38
+```
+The server will start at http://0.0.0.0:3000. Open it in your browser and you will see a tab AI Provider Configuration.
+Now you can start a new conversation with the agent by clicking on the plus sign on the left bar.
+The model can also be deployed with the following libraries:
+- [`LMStudio (recommended for quantized model)`](https://lmstudio.ai/): See [here](#lmstudio)
+- [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm)
+- [`ollama`](https://github.com/ollama/ollama): See [here](#ollama)
+- [`mistral-inference`](https://github.com/mistralai/mistral-inference): See [here](#mistral-inference)
+- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
+### OpenHands (recommended)
+#### Launch a server to deploy Devstral-Small-2505
+Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral-Small-2505`.
+In the case of the tutorial we spineed up a vLLM server running the command:
+```bash
+vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
+```
+The server address should be in the following format: `http://<your-server-url>:8000/v1`
+#### Launch OpenHands
+You can follow installation of OpenHands [here](https://docs.all-hands.dev/modules/usage/installation).
+The easiest way to launch OpenHands is to use the Docker image:
+```bash
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
+docker run -it --rm --pull=always \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
+    -e LOG_ALL_EVENTS=true \
+    -v /var/run/docker.sock:/var/run/docker.sock \
+    -v ~/.openhands-state:/.openhands-state \
+    -p 3000:3000 \
+    --add-host host.docker.internal:host-gateway \
+    --name openhands-app \
+    docker.all-hands.dev/all-hands-ai/openhands:0.38
+```
+Then, you can access the OpenHands UI at `http://localhost:3000`.
+#### Connect to the server
+When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier.
+Fill the following fields:
+- **Custom Model**: `openai/mistralai/Devstral-Small-2505`
+- **Base URL**: `http://<your-server-url>:8000/v1`
+- **API Key**: `token` (or any other token you used to launch the server if any)
+#### Use OpenHands powered by Devstral
+Now you're good to use Devstral Small inside OpenHands by **starting a new conversation**. Let's build a To-Do list app.
+<details>
+  <summary>To-Do list app</summary
+1. Let's ask Devstral to generate the app with the following prompt:
+```txt
+Build a To-Do list app with the following requirements:
+- Built using FastAPI and React.
+- Make it a one page app that:
+   - Allows to add a task.
+   - Allows to delete a task.
+   - Allows to mark a task as done.
+   - Displays the list of tasks.
+- Store the tasks in a SQLite database.
+```
+![Agent prompting](assets/tuto_open_hands/agent_prompting.png)
+2. Let's see the result
+You should see the agent construct the app and be able to explore the code it generated.
+If it doesn't do it automatically, ask Devstral to deploy the app or do it manually, and then go the front URL deployment to see the app.
+![Agent working](assets/tuto_open_hands/agent_working.png)
+![App UI](assets/tuto_open_hands/app_ui.png)
+3. Iterate
+Now that you have a first result you can iterate on it by asking your agent to improve it. For example, in the app generated we could click on a task to mark it checked but having a checkbox would improve UX. You could also ask it to add a feature to edit a task, or to add a feature to filter the tasks by status.
+Enjoy building with Devstral Small and OpenHands!
+</details>
+### LMStudio (recommended for quantized model)
+Download the weights from huggingface:
+```
+pip install -U "huggingface_hub[cli]"
+huggingface-cli download \
+"mistralai/Devstral-Small-2505_gguf" \
+--include "devstralQ4_K_M.gguf" \
+--local-dir "mistralai/Devstral-Small-2505_gguf/"
+```
+You can serve the model locally with [LMStudio](https://lmstudio.ai/).
+* Download [LM Studio](https://lmstudio.ai/) and install it
+* Install `lms cli ~/.lmstudio/bin/lms bootstrap`
+* In a bash terminal, run `lms import devstralQ4_K_M.ggu` in the directory where you've downloaded the model checkpoint (e.g. `mistralai/Devstral-Small-2505_gguf`)
+* Open the LMStudio application, click the terminal icon to get into the developer tab. Click select a model to load and select Devstral Q4 K M. Toggle the status button to start the model, in setting oggle Serve on Local Network to be on.
+* On the right tab, you will see an API identifier which should be devstralq4_k_m and an api address under API Usage. Keep note of this address, we will use it in the next step.
+Launch Openhands
+You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker
+```bash
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
+docker run -it --rm --pull=always \
+	-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
+	-e LOG_ALL_EVENTS=true \
+	-v /var/run/docker.sock:/var/run/docker.sock \
+	-v ~/.openhands-state:/.openhands-state \
+	-p 3000:3000 \
+	--add-host host.docker.internal:host-gateway \
+	--name openhands-app \
+	docker.all-hands.dev/all-hands-ai/openhands:0.38
+```
+Click “see advanced setting” on the second line.
+In the new tab, toggle advanced to on. Set the custom model to be mistral/devstralq4_k_m and Base URL the api address we get from the last step in LM Studio. Set API Key to dummy. Click save changes.
+### vLLM (recommended)
+We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
+to implement production-ready inference pipelines.
+**_Installation_**
+Make sure you install [`vLLM >= 0.8.5`](https://github.com/vllm-project/vllm/releases/tag/v0.8.5):
+```
+pip install vllm --upgrade
+```
+Doing so should automatically install [`mistral_common >= 1.5.4`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.4).
+To check:
+```
+python -c "import mistral_common; print(mistral_common.__version__)"
+```
+You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
+#### Server
+We recommand that you use Devstral in a server/client setting.
+1. Spin up a server:
+```
+vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
+```
+2. To ping the client you can use a simple Python snippet.
+```py
+import requests
+import json
+from huggingface_hub import hf_hub_download
+url = "http://<your-server-url>:8000/v1/chat/completions"
+headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
+model = "mistralai/Devstral-Small-2505"
+def load_system_prompt(repo_id: str, filename: str) -> str:
+    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
+    with open(file_path, "r") as file:
+        system_prompt = file.read()
+    return system_prompt
+SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
+messages = [
+    {"role": "system", "content": SYSTEM_PROMPT},
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": "Write a function that computes fibonacci in Python.",
+            },
+        ],
+    },
+]
+data = {"model": model, "messages": messages, "temperature": 0.15}
+response = requests.post(url, headers=headers, data=json.dumps(data))
+print(response.json()["choices"][0]["message"]["content"])
+```
+<details>
+    <summary>Output</summary>
+Certainly! The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, usually starting with 0 and 1. Here's a simple Python function to compute the Fibonacci sequence:
+### Iterative Approach
+This approach uses a loop to compute the Fibonacci number iteratively.
+```python
+def fibonacci(n):
+    if n <= 0:
+        return "Input should be a positive integer."
+    elif n == 1:
+        return 0
+    elif n == 2:
+        return 1
+    a, b = 0, 1
+    for _ in range(2, n):
+        a, b = b, a + b
+    return b
+# Example usage:
+print(fibonacci(10))  # Output: 34
+```
+### Recursive Approach
+This approach uses recursion to compute the Fibonacci number. Note that this is less efficient for large `n` due to repeated calculations.
+```python
+def fibonacci_recursive(n):
+    if n <= 0:
+        return "Input should be a positive integer."
+    elif n == 1:
+        return 0
+    elif n == 2:
+        return 1
+    else:
+        return fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)
+# Example usage:
+print(fibonacci_recursive(10))  # Output: 34
+```
+\### Memoization Approach
+This approach uses memoization to store previously computed Fibonacci numbers, making it more efficient than the simple recursive approach.
+```python
+def fibonacci_memo(n, memo={}):
+    if n <= 0:
+        return "Input should be a positive integer."
+    elif n == 1:
+        return 0
+    elif n == 2:
+        return 1
+    elif n in memo:
+        return memo[n]
+    memo[n] = fibonacci_memo(n - 1, memo) + fibonacci_memo(n - 2, memo)
+    return memo[n]
+# Example usage:
+print(fibonacci_memo(10))  # Output: 34
+```
+\### Dynamic Programming Approach
+This approach uses an array to store the Fibonacci numbers up to `n`.
+```python
+def fibonacci_dp(n):
+    if n <= 0:
+        return "Input should be a positive integer."
+    elif n == 1:
+        return 0
+    elif n == 2:
+        return 1
+    fib = [0, 1] + [0] * (n - 2)
+    for i in range(2, n):
+        fib[i] = fib[i - 1] + fib[i - 2]
+    return fib[n - 1]
+# Example usage:
+print(fibonacci_dp(10))  # Output: 34
+```
+You can choose any of these approaches based on your needs. The iterative and dynamic programming approaches are generally more efficient for larger values of `n`.
+</details>
+### Mistral-inference
+We recommend using mistral-inference to quickly try out / "vibe-check" Devstral.
+#### Install
+Make sure to have mistral_inference >= 1.6.0 installed.
+```bash
+pip install mistral_inference --upgrade
+```
+#### Download
+```python
+from huggingface_hub import snapshot_download
+from pathlib import Path
+mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
+mistral_models_path.mkdir(parents=True, exist_ok=True)
+snapshot_download(repo_id="mistralai/Devstral-Small-2505", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
+```
+#### Python
+You can run the model using the following command:
+```bash
+mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300
+```
+If you prompt it with "Write me a unique and efficient function that computes fibonacci in Python", the model should generate something along the following lines:
+<details>
+  <summary>Output</summary>
+Certainly! A common and efficient way to compute Fibonacci numbers is by using memoization to store previously computed values. This avoids redundant calculations and significantly improves performance. Below is a Python function that uses memoization to compute Fibonacci numbers efficiently:
+```python
+def fibonacci(n, memo=None):
+    if memo is None:
+        memo = {}
+    if n in memo:
+        return memo[n]
+    if n <= 1:
+        return n
+    memo[n] = fibonacci(n - 1, memo) + fibonacci(n - 2, memo)
+    return memo[n]
+# Example usage:
+n = 10
+print(f"Fibonacci number at position {n} is {fibonacci(n)}")
+```
+### Explanation:
+1. **Base Case**: If `n` is 0 or 1, the function returns `n` because the Fibonacci sequence starts with 0 and 1.
+2. **Memoization**: The function uses a dictionary `memo` to store the results of previously computed Fibonacci numbers.
+3. **Recursive Case**: For other values of `n`, the function recursively computes the Fibonacci number by summing the results of `fibonacci(n - 1)` and `fibonacci(n)`
+</details>
+### Ollama
+You can run Devstral using the [Ollama](https://ollama.ai/) CLI.
+```bash
+ollama run devstral
+```
+### Transformers
+To make the best use of our model with transformers make sure to have [installed](https://github.com/mistralai/mistral-common) `    mistral-common >= 1.5.5` to use our tokenizer.
+```bash
+pip install mistral-common --upgrade
+```
+Then load our tokenizer along with the model and generate:
+```python
+import torch
+from mistral_common.protocol.instruct.messages import (
+    SystemMessage, UserMessage
+)
+from mistral_common.protocol.instruct.request import ChatCompletionRequest
+from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
+from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
+from huggingface_hub import hf_hub_download
+from transformers import AutoModelForCausalLM
+def load_system_prompt(repo_id: str, filename: str) -> str:
+    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
+    with open(file_path, "r") as file:
+        system_prompt = file.read()
+    return system_prompt
+model_id = "mistralai/Devstral-Small-2505"
+tekken_file = hf_hub_download(repo_id=model_id, filename="tekken.json")
+SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
+tokenizer = MistralTokenizer.from_file(tekken_file)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+tokenized = tokenizer.encode_chat_completion(
+    ChatCompletionRequest(
+        messages=[
+            SystemMessage(content=SYSTEM_PROMPT),
+            UserMessage(content="Write me a function that computes fibonacci in Python."),
+        ],
+    )
+)
+output = model.generate(
+    input_ids=torch.tensor([tokenized.tokens]),
+    max_new_tokens=1000,
+)[0]
+decoded_output = tokenizer.decode(output[len(tokenized.tokens):])
+print(decoded_output)
+```