Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22 • 63
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models Paper • 2504.13367 • Published Apr 17 • 25
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published Jun 14, 2024 • 79
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation Paper • 2406.08656 • Published Jun 12, 2024 • 8
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29, 2024 • 22
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12, 2024 • 29