Update README.md
Browse files
README.md
CHANGED
|
@@ -176,16 +176,19 @@ special_tokens:
|
|
| 176 |
|
| 177 |
The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025).
|
| 178 |
|
| 179 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
|
| 181 |
-
The results below are averaged over four runs.
|
| 182 |
-
|
| 183 |
-
| Dataset | Pass@1 Score |
|
| 184 |
-
|--------------|-------------|
|
| 185 |
-
| MATH-500 | 0.95 |
|
| 186 |
-
| AIME 2025 | 0.5333 |
|
| 187 |
-
| AIME 2024 | 0.6667 |
|
| 188 |
-
| GPQA Diamond | 0.6202 |
|
| 189 |
|
| 190 |
## Citation
|
| 191 |
|
|
|
|
| 176 |
|
| 177 |
The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025).
|
| 178 |
|
| 179 |
+
Our results below are averaged over four runs.
|
| 180 |
+
|
| 181 |
+
| Pass@1 Score | # Params | MATH-500 | AIME 2025 | AIME 2024 | GPQA Diamond |
|
| 182 |
+
|-----------------------------------|---------|---------|-----------|-----------|--------------|
|
| 183 |
+
| Mistral-24B-Reasoning (Ours) | 24B | 0.95 | 0.5333 | 0.6667 | 0.6202 |
|
| 184 |
+
| DeepSeek-R1-Distill-Llama-70B | 70B | 94.5 | 46.67 | 70.0 | 65.2 |
|
| 185 |
+
| DeepSeek-R1-Distill-Qwen-32B | 32B | 94.3 | 60.0 | 72.6 | 62.1 |
|
| 186 |
+
| s1.1-32B | 32B | 93.2 | 40.0 | 56.7 | 61.62 |
|
| 187 |
+
| DeepSeek-R1 | 671B | 97.3 | 70.0 | 72.6 | 71.5 |
|
| 188 |
+
| o1 | - | - | 79.0 | - | - |
|
| 189 |
+
| o3-mini (high) | - | - | 86.5 | - | - |
|
| 190 |
+
| o3-mini (medium) | - | - | 76.5 | - | - |
|
| 191 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 192 |
|
| 193 |
## Citation
|
| 194 |
|