Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content Paper • 2506.20331 • Published Jun 25 • 5
Gaperon: A Peppered English-French Generative Language Model Suite Paper • 2510.25771 • Published Oct 29 • 15
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Paper • 2404.07647 • Published Apr 11, 2024 • 4
On the Scaling Laws of Geographical Representation in Language Models Paper • 2402.19406 • Published Feb 29, 2024
Anisotropy Is Inherent to Self-Attention in Transformers Paper • 2401.12143 • Published Jan 22, 2024
MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling Paper • 2212.07284 • Published Dec 14, 2022
Headless Language Models: Learning without Predicting with Contrastive Weight Tying Paper • 2309.08351 • Published Sep 15, 2023 • 3