VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation Paper • 2308.14710 • Published Aug 28, 2023
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting Paper • 2503.08677 • Published Mar 11 • 29
MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models Paper • 2505.19415 • Published May 26 • 2
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published 27 days ago • 46
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Paper • 2504.16030 • Published Apr 22 • 37
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models Paper • 2410.09733 • Published Oct 13, 2024 • 9