MiroMind-M1

🧾 Overview

Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.

MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (SFT) on 719K curated problems and reinforcement learning with verifiable rewards (RLVR) on 62K challenging examples, using a context-aware multi-stage policy optimization method (CAMPO). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (MiroMind-M1-SFT-7B, MiroMind-M1-RL-7B, MiroMind-M1-RL-32B), data (MiroMind-M1-SFT-719K, MiroMind-M1-RL-62K), and training setups openly released.

📊 Evaluation

MiroMind-M1-SFT

Model	Initial Checkpoint	AIME24 (avg@64)	AIME25 (avg@64)	MATH500 (avg@5)
DeepSeek-R1-Distill	Qwen2.5-Math-7B	55.5	40.4†	92.8
OpenThoughts	Qwen2.5-7-Instruct	31.3	23.3	83.2
Open-R1	Qwen2.5-Math-7B-Instruct	36.7	40.0	90.6
Synthetic-1	Qwen2.5-7B-Instruct	30.0	26.6	85.6
MiroMind-SFT-7B	Qwen2.5-Math-7B	60.4	45.0	94.6

† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.

MiroMind-M1-RL

Model	AIME24 (avg@64)	AIME25 (avg@64)	MATH500 (avg@5)
DeepSeek-R1	79.8	70.0	–
DeepSeek-R1-0528	91.4	87.5	–
Qwen3-8B	76.0	67.3	–
DeepSeek-R1-0528-Qwen3-8B	86.0	76.3	–

32B Models trained from Qwen2.5 series

DeepSeek-R1-Distill-Qwen-32B	70.8	52.1	95.8
Skywork-OR1-32B-Preview	77.1	68.2	97.5
MiroMind-M1-RL-32B	77.5	65.6	96.4

7B Models trained from Qwen2.5 series

DeepSeek-R1-Distill-Qwen-7B	55.5	39.2	–
MiroMind-M1-SFT-7B	60.4	45.0	94.6
Light-R1-7B-DS	59.1	44.3	–
Skywork-OR1-7B	72.2	54.6	–
MiroMind-M1-RL-7B	73.4	57.8	96.7

🔗 Resources

Models

MiroMind-M1-SFT-7B
MiroMind-M1-RL-7B
MiroMind-M1-RL-32B

Data

MiroMind-M1-SFT-719K
MiroMind-M1-RL-62K

miromind-ai
/

MiroMind-M1-RL-32B

MiroMind-M1

🧾 Overview

📊 Evaluation

MiroMind-M1-SFT

MiroMind-M1-RL

🔗 Resources

Models

Data

Model tree for miromind-ai/MiroMind-M1-RL-32B

Collection including miromind-ai/MiroMind-M1-RL-32B

MiroMind-M1