A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published 8 days ago • 153
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 176
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published 10 days ago • 77