Weight subcloning: direct initialization of transformers using larger pretrained ones Paper • 2312.09299 • Published Dec 14, 2023 • 19
I see what you hear: a vision-inspired method to localize words Paper • 2210.13567 • Published Oct 24, 2022
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published Sep 19, 2024 • 23