Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback Paper • 2506.03106 • Published Jun 3 • 6
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing Paper • 2506.09965 • Published Jun 11 • 3
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km Paper • 2510.09606 • Published Oct 10 • 17
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 13 days ago • 30
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published 18 days ago • 28
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs Paper • 2505.21327 • Published May 27 • 83
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Paper • 2505.17018 • Published May 22 • 15
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15, 2024 • 49
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 104
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Paper • 2412.02611 • Published Dec 3, 2024 • 26
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Paper • 2412.02611 • Published Dec 3, 2024 • 26
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Paper • 2412.02611 • Published Dec 3, 2024 • 26
Roadmap towards Superhuman Speech Understanding using Large Language Models Paper • 2410.13268 • Published Oct 17, 2024 • 34