TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning Paper • 2510.06217 • Published 16 days ago • 61
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3 • 21
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published 23 days ago • 52
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published 23 days ago • 52 • 3