Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations Paper • 2506.04633 • Published Jun 5 • 19
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning Paper • 2507.12508 • Published 14 days ago • 25
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published 13 days ago • 71
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published 13 days ago • 221
Taming generative video models for zero-shot optical flow extraction Paper • 2507.09082 • Published 19 days ago • 11
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29 • 68
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs Paper • 2506.21656 • Published Jun 26 • 14
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published Dec 31, 2024 • 23
STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs Paper • 2505.15804 • Published May 21 • 10
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published Mar 25 • 36
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1 • 66
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Paper • 2502.13143 • Published Feb 18 • 31
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22, 2024 • 30
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark Paper • 2412.07825 • Published Dec 10, 2024 • 12
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Paper • 2506.03135 • Published Jun 3 • 38
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published 23 days ago • 45
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published 28 days ago • 34