VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling Paper • 2501.00574 • Published Dec 31, 2024 • 6
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception Paper • 2509.21100 • Published 25 days ago • 1
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published Apr 9 • 12
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning Paper • 2510.11606 • Published 7 days ago • 2
Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale Paper • 2509.24910 • Published 21 days ago • 3
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Paper • 2506.10857 • Published Jun 12 • 30
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 297