ReMamba: Equip Mamba with Effective Long-Sequence Modeling Paper • 2408.15496 • Published Aug 28, 2024 • 12
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism Paper • 2406.03853 • Published Jun 6, 2024
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration Paper • 2404.12022 • Published Apr 18, 2024
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression Paper • 2310.15594 • Published Oct 24, 2023 • 1
Lifting the Curse of Capacity Gap in Distilling Language Models Paper • 2305.12129 • Published May 20, 2023