Yura Choi's picture

40 10

Yura Choi

Yuuraa

·

Yuuraa

AI & ML interests

Large Multimodal Models, Video Understanding

Recent Activity

upvoted a paper 6 days ago

Yume: An Interactive World Generation Model

upvoted a paper 11 days ago

Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations

upvoted a paper 11 days ago

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

View all activity

Organizations

None yet

upvoted a paper 6 days ago

Yume: An Interactive World Generation Model

Paper • 2507.17744 • Published 7 days ago • 77

upvoted 4 papers 11 days ago

Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations

Paper • 2506.04633 • Published Jun 5 • 19

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

Paper • 2507.12508 • Published 14 days ago • 25

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published 13 days ago • 71

A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published 13 days ago • 221

upvoted a paper 13 days ago

Taming generative video models for zero-shot optical flow extraction

Paper • 2507.09082 • Published 19 days ago • 11

upvoted 11 papers 15 days ago

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Paper • 2505.23747 • Published May 29 • 68

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Paper • 2506.21656 • Published Jun 26 • 14

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Paper • 2501.00316 • Published Dec 31, 2024 • 23

STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs

Paper • 2505.15804 • Published May 21 • 10

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?

Paper • 2503.19990 • Published Mar 25 • 36

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Paper • 2506.04308 • Published Jun 4 • 43

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Paper • 2504.00883 • Published Apr 1 • 66

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Paper • 2502.13143 • Published Feb 18 • 31

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Paper • 2401.12168 • Published Jan 22, 2024 • 30

3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

Paper • 2412.07825 • Published Dec 10, 2024 • 12

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Paper • 2506.03135 • Published Jun 3 • 38

upvoted a paper 20 days ago

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

Paper • 2507.05240 • Published 23 days ago • 45

upvoted a paper 22 days ago

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Paper • 2507.01955 • Published 28 days ago • 34

liked a dataset 3 months ago

sankalpsinha77/MARVEL-40M

Preview • Updated Jun 21 • 22 • 3