JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published 6 days ago • 15
StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors Paper • 2512.16915 • Published 16 days ago • 37
OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification Paper • 2512.10756 • Published 23 days ago • 33
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published Nov 22, 2025 • 37
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published Nov 14, 2025 • 112
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11, 2025 • 37
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge Paper • 2507.21183 • Published Jul 27, 2025 • 14
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14, 2025 • 306
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 25
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published May 20, 2025 • 133
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7, 2025 • 82
WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation Paper • 2505.01490 • Published May 2, 2025 • 5
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31, 2025 • 76
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment Paper • 2410.05255 • Published Oct 7, 2024 • 5
DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks Paper • 2307.05628 • Published Jul 11, 2023 • 10
Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering Paper • 2402.00827 • Published Feb 1, 2024 • 2
LLaVa-NeXT Collection LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19, 2024 • 32
DreamReward: Text-to-3D Generation with Human Preference Paper • 2403.14613 • Published Mar 21, 2024 • 37