Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data Paper • 2505.05427 • Published May 8 • 2
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9, 2024 • 23