File size: 8,271 Bytes
420f843 ed0092e 72bf3da 7d167c0 40738e5 fffb4c7 40738e5 72bf3da ed0092e 72bf3da 37f1b7f 72bf3da ed0092e 72bf3da ed0092e 72bf3da 37f1b7f 7d167c0 37f1b7f ed0092e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
language:
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
- mathematical-reasoning
- qwen
- causal-lm
---
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="assets/MiromindAI_H.svg" width="50%" alt="MiroMindM1" />
</div>
<!-- <hr> -->
<div align="center">
[](https://huggingface.co/miromind-ai/MiroMind-M1-RL-7B)
[](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)
[](https://arxiv.org/abs/2507.14683)
[](https://github.com/MiroMindAsia/MiroMind-M1)
[](https://miromind.ai/)
</div>
This repository contains the MiroMind-M1-RL-32B model, part of the MiroMind-M1 series, described in the paper [MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization](https://huggingface.co/papers/2507.14683).
# MiroMind-M1
## 🧾 Overview
<div align="center">
<img src="assets/7b_performance_training.png" width="80%" alt="7B Model Training Performance" />
<p><i>Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.</i></p>
</div>
**MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released.
## 📊 Evaluation
### MiroMind-M1-SFT
| Model | Initial Checkpoint | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|------------------|----------------------------|--------|--------|---------|
| DeepSeek-R1-Distill | Qwen2.5-Math-7B | 55.5 | 40.4† | 92.8 |
| OpenThoughts | Qwen2.5-7-Instruct | 31.3 | 23.3 | 83.2 |
| Open-R1 | Qwen2.5-Math-7B-Instruct | 36.7 | 40.0 | 90.6 |
| Synthetic-1 | Qwen2.5-7B-Instruct | 30.0 | 26.6 | 85.6 |
| MiMo-7B-SFT | MiMo-7B-Base | 58.7 | 44.3 | 93.0 |
| **MiroMind-SFT-7B** | Qwen2.5-Math-7B | 60.4 | 45.0 | 94.6 |
*† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.*
### MiroMind-M1-RL
| Model | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|----------------------------------|--------|--------|---------|
| DeepSeek-R1 | 79.8 | 70.0 | – |
| DeepSeek-R1-0528 | 91.4 | 87.5 | – |
| Qwen3-8B | 76.0 | 67.3 | – |
| DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – |
| MiMo-7B-RL | 68.2 | 55.4 | 95.8 |
| <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 |
| Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 |
| **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4 |
| <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | – |
| **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6 |
| Light-R1-7B-DS | 59.1 | 44.3 | – |
| Skywork-OR1-7B | 72.2 | 54.6 | – |
| **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 |
## 🔗 Resources
### Models
[`MiroMind-M1-SFT-7B`](https://huggingface.co/miromind-ai/MiroMind-M1-SFT-7B)<br>
[`MiroMind-M1-RL-7B`](https://huggingface.co/miromind-ai/MiroMind-M1-RL-7B)<br>
[`MiroMind-M1-RL-32B`](https://huggingface.co/miromind-ai/MiroMind-M1-RL-32B)<br>
### Data
[`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
[`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>
## 🚀 Quickstart
You can explore the models using the Transformers library.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "miromind-ai/MiroMind-M1-RL-32B" # Or miromind-ai/MiroMind-M1-RL-7B
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
prompt = "Given the equation $2x + 5 = 11$, what is the value of $x$?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
## 🛠 Getting Started
### Installation
venv environment:
```bash
git clone https://github.com/MiroMindAsia/MiroMind-M1.git
cd MiroMind-M1
# Install Python 3.10 environment.
python3.10 -m pip install virtualenv
virtualenv -p python3.10 venv
source venv/bin/activate
# Install dependencies.
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install numpy psutil ninja packaging cmake
pip3 install flash_attn==2.7.4.post1 --no-build-isolation # This may take a while...
pip3 install -e .
```
## 🏋️ Training
### Multi-Node Training
Here is a quik guided to start Ray for multi-node training.
#### On the head node
```bash
ray stop
ray start --head --node-ip-address $HEAD_NODE_IP --num-gpus 8 --dashboard-host=0.0.0.0
```
#### On other nodes
```bash
ray stop
ray start --address="$HEAD_NODE_IP:6379" --num-gpus 8
```
### Start Training
First, please provde the below variables:
```bash
export MODEL_PATH=YOUR_MODEL_PATH
export CKPTS_DIR=YOUR_CKPTS_DIR
export TRAIN_FILE=YOUR_TRAIN_FILE
export TEST_FILE=YOUR_TEST_FILE
export HOME=YOUR_HOME_PATH
```
Then run the below script to start the training:
```bash
bash m1_train_script/campo_32b.sh
```
## ⚖️ Run Evaluation
We provide ready-to-use evaluation scripts in the `m1_eval_script/` directory for mathematical reasoning benchmarks.
### Quick Start
```bash
# Evaluate on AIME 2024
bash m1_eval_script/evaluate_7b_aime24.sh
# Evaluate on AIME 2025
bash m1_eval_script/evaluate_7b_aime25.sh
# Evaluate on Math-500
bash m1_eval_script/evaluate_7b_math500.sh
```
### Supported Benchmarks
| Dataset | Script | Standard Runs |
|---------|--------|---------------|
| **AIME 2024** | `evaluate_7b_aime24.sh` | 64 runs |
| **AIME 2025** | `evaluate_7b_aime25.sh` | 64 runs |
| **Math-500** | `evaluate_7b_math500.sh` | 5 runs |
### Results
Results are saved in `results/[model_name]/[dataset_name]/` with:
- `average_accuracy.txt`: Final accuracy score
- `run[X]_inference_eval_results.csv`: Detailed results
## 🙏 Acknowledgement
The RL trianing is built from the wonderful [`verl`](https://github.com/volcengine/verl) project. |