MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 260
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision Paper • 2506.06253 • Published Jun 6 • 9
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision Paper • 2506.06253 • Published Jun 6 • 9
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision Paper • 2506.06253 • Published Jun 6 • 9 • 2
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published Jun 5 • 20
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published Jun 5 • 20
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21 • 66
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published Apr 4 • 13
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 95
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Paper • 2501.14818 • Published Jan 20 • 6
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding Paper • 2412.12075 • Published Dec 16, 2024 • 1
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation Paper • 2111.02394 • Published Nov 3, 2021 • 2
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21 • 66
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21 • 66 • 5
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding Paper • 2412.12075 • Published Dec 16, 2024 • 1