GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset Paper • 2507.21033 • Published 3 days ago • 15
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper • 2505.04601 • Published May 7 • 27
Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark Paper • 2504.13143 • Published Apr 17 • 8
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10 • 29
ViLBench: A Suite for Vision-Language Process Reward Modeling Paper • 2503.20271 • Published Mar 26 • 7
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More Paper • 2502.03738 • Published Feb 6 • 11
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Paper • 2410.06244 • Published Oct 8, 2024 • 19
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23, 2024 • 39
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published Jun 24, 2024 • 27
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Paper • 2404.09990 • Published Apr 15, 2024 • 13