MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published 6 days ago • 45
ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention Paper • 2507.01004 • Published 27 days ago • 10
OctoThinker-Llama-1B Family Collection What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training. • 6 items • Updated 23 days ago • 2
OctoThinker-Llama-3B Family Collection What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training. • 6 items • Updated 23 days ago • 2
OctoThinker-Llama-8B Family Collection What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training. • 3 items • Updated 23 days ago • 3
Mid-training Analysis Checkpoints (Llama-3.2-3B) Collection What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training. • 10 items • Updated 22 days ago • 1
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25 • 46
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17 • 49
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Paper • 2505.15612 • Published May 21 • 34
NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning Paper • 2504.13941 • Published Apr 15 • 11
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others • Feb 20 • 286
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7 • 44
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3 • 32
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16 • 36
ProX Dataset Collection a collection of pre-training corpora refined by ProX • 6 items • Updated Feb 14 • 7
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 105