Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 26 days ago • 495
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 17 days ago • 76
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 14 days ago • 87
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation Paper • 2604.18168 • Published 8 days ago • 96
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published 8 days ago • 87
DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published 13 days ago • 61