CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Paper • 2405.05949 • Published May 9, 2024 • 3
Multi-Reward as Condition for Instruction-based Image Editing Paper • 2411.04713 • Published Nov 6, 2024 • 1
Where do Large Vision-Language Models Look at when Answering Questions? Paper • 2503.13891 • Published Mar 18 • 8
Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published Apr 22 • 15