Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3 • 69
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Paper • 2406.20085 • Published Jun 28, 2024 • 13
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Paper • 2401.02361 • Published Jan 4, 2024
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 37
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published Mar 18 • 49
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 74
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published Jun 25, 2024 • 19
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published Jun 20, 2024 • 34