InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding Paper • 2401.09149 • Published Jan 17, 2024 • 1
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism Paper • 2406.18485 • Published Jun 26, 2024 • 2
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 53
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning Paper • 2311.00257 • Published Nov 1, 2023 • 10