yentinglin commited on
Commit
4160111
·
verified ·
1 Parent(s): 76d57cf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -176,16 +176,19 @@ special_tokens:
176
 
177
  The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025).
178
 
179
- ### Pass@1 Scores
 
 
 
 
 
 
 
 
 
 
 
180
 
181
- The results below are averaged over four runs.
182
-
183
- | Dataset | Pass@1 Score |
184
- |--------------|-------------|
185
- | MATH-500 | 0.95 |
186
- | AIME 2025 | 0.5333 |
187
- | AIME 2024 | 0.6667 |
188
- | GPQA Diamond | 0.6202 |
189
 
190
  ## Citation
191
 
 
176
 
177
  The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025).
178
 
179
+ Our results below are averaged over four runs.
180
+
181
+ | Pass@1 Score | # Params | MATH-500 | AIME 2025 | AIME 2024 | GPQA Diamond |
182
+ |-----------------------------------|---------|---------|-----------|-----------|--------------|
183
+ | Mistral-24B-Reasoning (Ours) | 24B | 0.95 | 0.5333 | 0.6667 | 0.6202 |
184
+ | DeepSeek-R1-Distill-Llama-70B | 70B | 94.5 | 46.67 | 70.0 | 65.2 |
185
+ | DeepSeek-R1-Distill-Qwen-32B | 32B | 94.3 | 60.0 | 72.6 | 62.1 |
186
+ | s1.1-32B | 32B | 93.2 | 40.0 | 56.7 | 61.62 |
187
+ | DeepSeek-R1 | 671B | 97.3 | 70.0 | 72.6 | 71.5 |
188
+ | o1 | - | - | 79.0 | - | - |
189
+ | o3-mini (high) | - | - | 86.5 | - | - |
190
+ | o3-mini (medium) | - | - | 76.5 | - | - |
191
 
 
 
 
 
 
 
 
 
192
 
193
  ## Citation
194