VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse Paper • 2512.14531 • Published Dec 16, 2025 • 16
TWEO: Transformers Without Extreme Outliers Enables FP8 Training And Quantization For Dummies Paper • 2511.23225 • Published Nov 28, 2025 • 1
FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale Paper • 2601.22146 • Published Jan 29 • 11
Momentum Attention: The Physics of In-Context Learning and Spectral Forensics for Mechanistic Interpretability Paper • 2602.04902 • Published Feb 3 • 1
Superpositional Gradient Descent: Harnessing Quantum Principles for Model Training Paper • 2511.01918 • Published Nov 1, 2025 • 13
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published Oct 29, 2025 • 80