🧠Parallel-R1-Unseen_Step_200
Mid-Training Checkpoint of Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Stage: After 200 RL steps via alternating rewards — showing the adaptive parallel reasoning ability and serve as structure exploration stage.
This checkpoint aims to help you reproduce experimental results in Section 4.5: Extra Bonus: Parallel Thinking as a Mid-Training Exploration Strategy for RL Training.
- Downloads last month
- 414
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support