gradientai/Llama-3-8B-Instruct-Gradient-1048k Text Generation • 8B • Updated Oct 29, 2024 • 26.8k • 678
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published Dec 16, 2024 • 37
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 105
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published Jan 3 • 34
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published Jan 10 • 73
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 149
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published Feb 25 • 58
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 70
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs Paper • 2504.00072 • Published Mar 31 • 7
ReZero: Enhancing LLM search ability by trying one-more-time Paper • 2504.11001 • Published Apr 15 • 15
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 56
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 259
Demystifying the Visual Quality Paradox in Multimodal Large Language Models Paper • 2506.15645 • Published Jun 18 • 4
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching Paper • 2506.20480 • Published Jun 25 • 7
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Paper • 2506.19794 • Published Jun 24 • 8
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26 • 28
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Paper • 2507.07990 • Published 18 days ago • 44