InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 220
Video Analysis and Generation via a Semantic Progress Function Paper • 2604.22554 • Published 7 days ago • 59
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 4 days ago • 62
Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation Paper • 2604.19141 • Published 10 days ago • 1
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 9 days ago • 237
DiffEM: Learning from Corrupted Data with Diffusion Models via Expectation Maximization Paper • 2510.12691 • Published Dec 20, 2025 • 1
OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning Paper • 2603.24458 • Published Mar 25 • 9
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 29 days ago • 146
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published Mar 3 • 103
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published Mar 6 • 119
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published Mar 10 • 48
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published May 5, 2025 • 82