Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 18
AraMix: Recycling, Refiltering, and Deduplicating to Deliver the Largest Arabic Pretraining Corpus Paper • 2512.18834 • Published 24 days ago
SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data Paper • 2511.18411 • Published Nov 23, 2025