Update README.md
Browse files
README.md
CHANGED
@@ -49,9 +49,6 @@ print(outputs[0].outputs[0].text)
|
|
49 |
|
50 |
# 📃Evaluation
|
51 |
|
52 |
-
LUFFY is evaluated on six competition-level benchmarks, achieving state-of-the-art results among all zero-RL methods. It surpasses both on-policy RL and imitation learning (SFT), especially in generalization:
|
53 |
-
|
54 |
-
## LUFFY on Qwen2.5-Instruct-7B
|
55 |
| **Model** | **AIME 2024** | **AIME 2025** | **AMC** | **MATH-500** | **Minerva** | **Olympiad** | **Avg.** |
|
56 |
|-----------------------------------|-------------|-------------|---------|---------------|-------------|---------------|----------|
|
57 |
| Qwen2.5-7B-Instruct | 11.9 | 7.6 | 44.1 | 74.6 | 30.5 | 39.7 | 34.7 |
|
|
|
49 |
|
50 |
# 📃Evaluation
|
51 |
|
|
|
|
|
|
|
52 |
| **Model** | **AIME 2024** | **AIME 2025** | **AMC** | **MATH-500** | **Minerva** | **Olympiad** | **Avg.** |
|
53 |
|-----------------------------------|-------------|-------------|---------|---------------|-------------|---------------|----------|
|
54 |
| Qwen2.5-7B-Instruct | 11.9 | 7.6 | 44.1 | 74.6 | 30.5 | 39.7 | 34.7 |
|