reasoning_model
updated
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
•
2511.16334
•
Published
•
92
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
•
2509.07980
•
Published
•
101
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM
Test-time Compute
Paper
•
2509.04475
•
Published
•
3
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
•
2512.01374
•
Published
•
93
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper
•
2511.22570
•
Published
•
84
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models
Paper
•
2512.07843
•
Published
•
21
Paper
•
2510.01141
•
Published
•
119
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
•
2504.11468
•
Published
•
30
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
41
OpenR: An Open Source Framework for Advanced Reasoning with Large
Language Models
Paper
•
2410.09671
•
Published
•
1
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper
•
2512.16676
•
Published
•
195
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
Paper
•
2512.17260
•
Published
•
48
Latent Implicit Visual Reasoning
Paper
•
2512.21218
•
Published
•
62
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper
•
2512.20605
•
Published
•
59
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper
•
2512.19995
•
Published
•
14
P1: Mastering Physics Olympiads with Reinforcement Learning
Paper
•
2511.13612
•
Published
•
134
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper
•
2511.08567
•
Published
•
33
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model
Reasoning Ability in VibeThinker-1.5B
Paper
•
2511.06221
•
Published
•
131
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
Paper
•
2511.12982
•
Published
•
3
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics
Olympiad Benchmark?
Paper
•
2509.07894
•
Published
•
31