InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue Paper • 2510.13747 • Published Oct 15, 2025 • 30
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding Paper • 2508.21496 • Published Aug 29, 2025 • 55
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction Paper • 2502.11663 • Published Feb 17, 2025 • 40