Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute Paper • 2506.15882 • Published Jun 18 • 2
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published 9 days ago • 113
The Invisible Leash: Why RLVR May Not Escape Its Origin Paper • 2507.14843 • Published 8 days ago • 77
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Paper • 2507.15778 • Published 7 days ago • 19