HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark? Paper • 2509.07894 • Published 2 days ago • 26
CompassVerifier Collection CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward • 5 items • Updated 11 days ago • 6
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis Paper • 2508.15754 • Published 21 days ago • 4
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward Paper • 2508.03686 • Published Aug 5 • 35
CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards Paper • 2507.09104 • Published Jul 12 • 17
Rethinking Verification for LLM Code Generation: From Generation to Testing Paper • 2507.06920 • Published Jul 9 • 28
Coding Triangle: How Does Large Language Model Understand Code? Paper • 2507.06138 • Published Jul 8 • 20
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective Paper • 2505.19815 • Published May 26 • 37