DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published 18 days ago • 50
Evaluating, Synthesizing, and Enhancing for Customer Support Conversation Paper • 2508.04423 • Published Aug 6 • 9
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper • 2506.09827 • Published Jun 11 • 20
docling-project/SmolDocling-256M-preview Image-Text-to-Text • 0.3B • Updated Sep 17 • 199k • 1.6k
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On Paper • 2407.08348 • Published Jul 11, 2024 • 52
OLMo: Accelerating the Science of Language Models Paper • 2402.00838 • Published Feb 1, 2024 • 85
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Paper • 2402.08093 • Published Feb 12, 2024 • 62