SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning Paper • 2505.16186 • Published May 22 • 7
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models Paper • 2505.21523 • Published May 23 • 14
Hidden in Plain Sight: Probing Implicit Reasoning in Multimodal Language Models Paper • 2506.00258 • Published May 30 • 3
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Paper • 2507.13428 • Published 15 days ago • 15
Agents of Change: Self-Evolving LLM Agents for Strategic Planning Paper • 2506.04651 • Published Jun 5 • 8
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation Paper • 2206.08522 • Published Jun 17, 2022
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space Paper • 2505.15778 • Published May 21 • 17
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents Paper • 2504.00906 • Published Apr 1 • 24
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models Paper • 2310.03903 • Published Oct 5, 2023 • 1
Neuro-Symbolic Procedural Planning with Commonsense Prompting Paper • 2206.02928 • Published Jun 6, 2022
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models Paper • 2407.12366 • Published Jul 17, 2024 • 4
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing Paper • 2410.12836 • Published Oct 3, 2024
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Paper • 1904.03493 • Published Apr 6, 2019
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration Paper • 2501.13896 • Published Jan 23
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 Paper • 2502.12659 • Published Feb 18 • 7