R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28, 2025 • 110
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 181
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 182
VLM-Reasoner/details_._ckpt_Qwen2.5-VL-3B-Instruct-kl-rb Viewer • Updated Jun 1, 2025 • 1.52k • 17
VLM-Reasoner/details_._ckpt_Qwen2.5-VL-3B-Instruct-kl-rb Viewer • Updated Jun 1, 2025 • 1.52k • 17