TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning Paper • 2510.06217 • Published 27 days ago • 62
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3 • 21
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30 • 54
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27 • 84
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning Paper • 2506.01347 • Published Jun 2 • 3
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning Paper • 2505.16421 • Published May 22 • 19