sail/Sailor2-L-8B
Text Generation
β’
9B
β’
Updated
β’
10
None defined yet.
Rethinking the Trust Region in LLM Reinforcement Learning
Revisiting Parameter Server in LLM Post-Training