yentinglin
/

Mistral-Small-24B-Instruct-2501-reasoning

Text Generation

Model card Files Files and versions

yentinglin commited on Feb 17

Commit

4160111

·

verified ·

1 Parent(s): 76d57cf

Update README.md

Files changed (1) hide show

README.md +12 -9

README.md CHANGED Viewed

@@ -176,16 +176,19 @@ special_tokens:
 The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025).
-### Pass@1 Scores
-The results below are averaged over four runs.
-| Dataset       | Pass@1 Score |
-|--------------|-------------|
-| MATH-500     | 0.95        |
-| AIME 2025    | 0.5333      |
-| AIME 2024    | 0.6667      |
-| GPQA Diamond | 0.6202      |
 ## Citation

 The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025).
+Our results below are averaged over four runs.
+| Pass@1 Score                      | # Params | MATH-500 | AIME 2025 | AIME 2024 | GPQA Diamond |
+|-----------------------------------|---------|---------|-----------|-----------|--------------|
+| Mistral-24B-Reasoning (Ours)      | 24B     | 0.95    | 0.5333    | 0.6667    | 0.6202       |
+| DeepSeek-R1-Distill-Llama-70B     | 70B     | 94.5    | 46.67     | 70.0      | 65.2         |
+| DeepSeek-R1-Distill-Qwen-32B      | 32B     | 94.3    | 60.0      | 72.6      | 62.1         |
+| s1.1-32B                          | 32B     | 93.2    | 40.0      | 56.7      | 61.62        |
+| DeepSeek-R1                       | 671B    | 97.3    | 70.0      | 72.6      | 71.5         |
+| o1                                | -       | -       | 79.0      | -         | -            |
+| o3-mini (high)                    | -       | -       | 86.5      | -         | -            |
+| o3-mini (medium)                  | -       | -       | 76.5      | -         | -            |
 ## Citation