Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment Paper • 2505.11821 • Published May 17 • 14
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving Paper • 2507.06804 • Published 24 days ago • 15
Self-Calibration Collection Efficient Test-Time Scaling via Self-Calibration https://arxiv.org/abs/2503.00031 • 7 items • Updated Jun 8 • 2
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning Paper • 2506.09033 • Published Jun 10 • 7
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 128
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space Paper • 2505.15778 • Published May 21 • 17
POSS: Position Specialist Generates Better Draft for Speculative Decoding Paper • 2506.03566 • Published Jun 4 • 6
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27 • 106
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning Paper • 2505.16421 • Published May 22 • 19
Generative AI Act II: Test Time Scaling Drives Cognition Engineering Paper • 2504.13828 • Published Apr 18 • 17
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 280
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 300
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published Apr 9 • 75
Optimizing Language Model's Reasoning Abilities with Weak Supervision Paper • 2405.04086 • Published May 7, 2024 • 2