view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes By ybelkada and 1 other • Aug 17, 2022 • 103
Git Re-Basin: Merging Models modulo Permutation Symmetries Paper • 2209.04836 • Published Sep 11, 2022 • 2
view article Article Unlocking Longer Generation with Key-Value Cache Quantization By RaushanTurganbay • May 16, 2024 • 49
view article Article Don't repeat yourself - 🤗 Transformers Design Philosophy By patrickvonplaten • Apr 5, 2022 • 37
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper • 1910.10683 • Published Oct 23, 2019 • 14