Lessons from the Trenches on Reproducible Evaluation of Language Models Paper • 2405.14782 • Published May 23, 2024 • 1
Running 85 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 85 Evaluate multilingual models using FineTasks
Running 3.57k The Ultra-Scale Playbook 🌌 3.57k The ultimate guide to training LLM on large GPU Clusters
The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published Oct 15 • 31
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 144
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26 • 58
Does your data spark joy? Performance gains from domain upsampling at the end of training Paper • 2406.03476 • Published Jun 5, 2024 • 4
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 250
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20, 2024 • 45