File size: 8,271 Bytes
420f843
 
 
ed0092e
 
 
 
 
 
 
 
 
72bf3da
 
 
 
 
 
 
 
 
 
 
 
7d167c0
40738e5
fffb4c7
40738e5
72bf3da
 
 
 
ed0092e
72bf3da
 
 
 
 
 
 
 
 
 
37f1b7f
72bf3da
 
 
 
 
 
 
 
 
 
 
ed0092e
72bf3da
 
 
 
 
 
 
 
 
 
 
ed0092e
72bf3da
 
 
 
 
 
 
 
 
37f1b7f
 
 
 
 
7d167c0
 
 
37f1b7f
 
 
 
ed0092e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
language:
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
- mathematical-reasoning
- qwen
- causal-lm
---

<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->

<div align="center">
  <img src="assets/MiromindAI_H.svg" width="50%" alt="MiroMindM1" />
</div>
<!-- <hr> -->
<div align="center">

[![Models](https://img.shields.io/badge/Models-5EDDD2?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/miromind-ai/MiroMind-M1-RL-7B)
[![Data](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)
[![Paper](https://img.shields.io/badge/Paper-000000?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2507.14683)
[![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/MiroMindAsia/MiroMind-M1)
[![Website](https://img.shields.io/badge/Website-000000?style=for-the-badge&logo=google-chrome&logoColor=white)](https://miromind.ai/)

</div>

This repository contains the MiroMind-M1-RL-32B model, part of the MiroMind-M1 series, described in the paper [MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization](https://huggingface.co/papers/2507.14683).

# MiroMind-M1


## 🧾 Overview
<div align="center">
  <img src="assets/7b_performance_training.png" width="80%" alt="7B Model Training Performance" />
  <p><i>Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.</i></p>
</div>

**MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released.


## 📊 Evaluation

### MiroMind-M1-SFT
| Model           | Initial Checkpoint         | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|------------------|----------------------------|--------|--------|---------|
| DeepSeek-R1-Distill                  | Qwen2.5-Math-7B             | 55.5   | 40.4†  | 92.8    |
| OpenThoughts                         | Qwen2.5-7-Instruct           | 31.3   | 23.3   | 83.2    |
| Open-R1                              | Qwen2.5-Math-7B-Instruct     | 36.7   | 40.0   | 90.6    |
| Synthetic-1                          | Qwen2.5-7B-Instruct          | 30.0   | 26.6   | 85.6    |
| MiMo-7B-SFT                          | MiMo-7B-Base          | 58.7   | 44.3   | 93.0    |
| **MiroMind-SFT-7B**                  | Qwen2.5-Math-7B             | 60.4   | 45.0   | 94.6    |

*† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.*

### MiroMind-M1-RL
| Model                            | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|----------------------------------|--------|--------|---------|
| DeepSeek-R1                      | 79.8   | 70.0   | –       |
| DeepSeek-R1-0528                 | 91.4   | 87.5   | –       |
| Qwen3-8B                         | 76.0   | 67.3   | –       |
| DeepSeek-R1-0528-Qwen3-8B        | 86.0   | 76.3   | –       |
| MiMo-7B-RL                       | 68.2   | 55.4   | 95.8    |
| <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-32B     | 70.8   | 52.1   | 95.8    |
| Skywork-OR1-32B-Preview          | 77.1   | 68.2   | 97.5    |
| **MiroMind-M1-RL-32B**           | 77.5   | 65.6   | 96.4    |
| <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-7B      | 55.5   | 39.2   | –       |
| **MiroMind-M1-SFT-7B**           | 60.4   | 45.0   | 94.6    |
| Light-R1-7B-DS                   | 59.1   | 44.3   | –       |
| Skywork-OR1-7B                   | 72.2   | 54.6   | –       |
| **MiroMind-M1-RL-7B**            | 73.4   | 57.8   | 96.7    |


## 🔗 Resources
### Models
[`MiroMind-M1-SFT-7B`](https://huggingface.co/miromind-ai/MiroMind-M1-SFT-7B)<br>
[`MiroMind-M1-RL-7B`](https://huggingface.co/miromind-ai/MiroMind-M1-RL-7B)<br>
[`MiroMind-M1-RL-32B`](https://huggingface.co/miromind-ai/MiroMind-M1-RL-32B)<br>

### Data
[`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
[`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>

## 🚀 Quickstart

You can explore the models using the Transformers library.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "miromind-ai/MiroMind-M1-RL-32B" # Or miromind-ai/MiroMind-M1-RL-7B
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Given the equation $2x + 5 = 11$, what is the value of $x$?"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

## 🛠 Getting Started

### Installation

venv environment:

```bash
git clone https://github.com/MiroMindAsia/MiroMind-M1.git
cd MiroMind-M1

# Install Python 3.10 environment.
python3.10 -m pip install virtualenv
virtualenv -p python3.10 venv
source venv/bin/activate

# Install dependencies.
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install numpy psutil ninja packaging cmake
pip3 install flash_attn==2.7.4.post1 --no-build-isolation # This may take a while...
pip3 install -e .
```

## 🏋️ Training

### Multi-Node Training

Here is a quik guided to start Ray for multi-node training.

#### On the head node
```bash
ray stop
ray start --head --node-ip-address $HEAD_NODE_IP --num-gpus 8 --dashboard-host=0.0.0.0
```

#### On other nodes
```bash
ray stop
ray start --address="$HEAD_NODE_IP:6379" --num-gpus 8
```

### Start Training

First, please provde the below variables:

```bash
export MODEL_PATH=YOUR_MODEL_PATH
export CKPTS_DIR=YOUR_CKPTS_DIR
export TRAIN_FILE=YOUR_TRAIN_FILE
export TEST_FILE=YOUR_TEST_FILE
export HOME=YOUR_HOME_PATH
```

Then run the below script to start the training:

```bash
bash m1_train_script/campo_32b.sh
```

## ⚖️ Run Evaluation

We provide ready-to-use evaluation scripts in the `m1_eval_script/` directory for mathematical reasoning benchmarks.

### Quick Start

```bash
# Evaluate on AIME 2024
bash m1_eval_script/evaluate_7b_aime24.sh

# Evaluate on AIME 2025  
bash m1_eval_script/evaluate_7b_aime25.sh

# Evaluate on Math-500
bash m1_eval_script/evaluate_7b_math500.sh
```

### Supported Benchmarks

| Dataset | Script | Standard Runs |
|---------|--------|---------------|
| **AIME 2024** | `evaluate_7b_aime24.sh` | 64 runs |
| **AIME 2025** | `evaluate_7b_aime25.sh` | 64 runs |
| **Math-500** | `evaluate_7b_math500.sh` | 5 runs |

### Results

Results are saved in `results/[model_name]/[dataset_name]/` with:
- `average_accuracy.txt`: Final accuracy score
- `run[X]_inference_eval_results.csv`: Detailed results

## 🙏 Acknowledgement

The RL trianing is built from the wonderful [`verl`](https://github.com/volcengine/verl) project.