Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published 4 days ago • 29
ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter Paper • 2407.11298 • Published Jul 16, 2024 • 6
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published Feb 13 • 28
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities Paper • 2510.08759 • Published 25 days ago • 45