tianchi007
's Collections
llm_pretrain
updated
Paper
•
2412.08905
•
Published
•
119
Evaluating and Aligning CodeLLMs on Human Preference
Paper
•
2412.05210
•
Published
•
51
Evaluating Language Models as Synthetic Data Generators
Paper
•
2412.03679
•
Published
•
49
Yi-Lightning Technical Report
Paper
•
2412.01253
•
Published
•
29
Large Language Model-Brained GUI Agents: A Survey
Paper
•
2411.18279
•
Published
•
32
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
•
2411.16489
•
Published
•
49
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
•
2411.14405
•
Published
•
62
Natural Language Reinforcement Learning
Paper
•
2411.14251
•
Published
•
31
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer
Use
Paper
•
2411.10323
•
Published
•
35
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
•
2411.08147
•
Published
•
67
A Survey of Small Language Models
Paper
•
2410.20011
•
Published
•
44
Paper
•
2410.21276
•
Published
•
85
Qwen2.5-Coder Technical Report
Paper
•
2409.12186
•
Published
•
150
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for
Reinforcement Learning and Monte-Carlo Tree Search
Paper
•
2408.08152
•
Published
•
60
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
•
2406.11931
•
Published
•
65
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
•
2408.06195
•
Published
•
74
Paper
•
2412.15115
•
Published
•
372
Paper
•
2412.13501
•
Published
•
30
DeepSeek-V3 Technical Report
Paper
•
2412.19437
•
Published
•
67
Direct Language Model Alignment from Online AI Feedback
Paper
•
2402.04792
•
Published
•
33
Solving math word problems with process- and outcome-based feedback
Paper
•
2211.14275
•
Published
•
10
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
100
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
116
OmniThink: Expanding Knowledge Boundaries in Machine Writing through
Thinking
Paper
•
2501.09751
•
Published
•
49
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
41
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language
Model
Paper
•
2502.02737
•
Published
•
240
s1: Simple test-time scaling
Paper
•
2501.19393
•
Published
•
125
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
123
Qwen2.5-1M Technical Report
Paper
•
2501.15383
•
Published
•
72
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper
•
2404.07503
•
Published
•
32
Pre-training Small Base LMs with Fewer Tokens
Paper
•
2404.08634
•
Published
•
36
CodecLM: Aligning Language Models with Tailored Synthetic Data
Paper
•
2404.05875
•
Published
•
18
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web
Navigating Agent
Paper
•
2404.03648
•
Published
•
29
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models
with a Self-Critique Pipeline
Paper
•
2404.02893
•
Published
•
23
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
•
2503.24290
•
Published
•
63
START: Self-taught Reasoner with Tools
Paper
•
2503.04625
•
Published
•
114
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
•
2503.21460
•
Published
•
78
A Comprehensive Survey on Long Context Language Modeling
Paper
•
2503.17407
•
Published
•
50
Paper
•
2503.19786
•
Published
•
53
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper
•
2503.20201
•
Published
•
48
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
•
2503.21614
•
Published
•
42
Why Do Multi-Agent LLM Systems Fail?
Paper
•
2503.13657
•
Published
•
48
Process-based Self-Rewarding Language Models
Paper
•
2503.03746
•
Published
•
40
Rethinking Reflection in Pre-Training
Paper
•
2504.04022
•
Published
•
80
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
•
2504.07128
•
Published
•
86
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
•
2505.24726
•
Published
•
264
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
•
2506.01939
•
Published
•
173
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
•
2505.24864
•
Published
•
133
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
•
2505.22617
•
Published
•
126
Scaling Test-time Compute for LLM Agents
Paper
•
2506.12928
•
Published
•
61
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper
•
2505.24863
•
Published
•
95
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
•
2506.06395
•
Published
•
129