RePO: Replay-Enhanced Policy Optimization
-
Siheng99/Qwen2.5-Math-1.5B-DeepMath-1024samples-GRPO
Text Generation • 2B • Updated • 1 -
Siheng99/Qwen2.5-Math-1.5B-DeepMath-1024samples-RePO
Text Generation • 2B • Updated • 1 -
Siheng99/Qwen2.5-Math-7B-DeepMath-1024samples-GRPO
Text Generation • 8B • Updated • 1 -
Siheng99/Qwen2.5-Math-7B-DeepMath-1024samples-RePO
Text Generation • 8B • Updated • 3