Seeing Fast and Slow: Learning the Flow of Time in Videos Paper • 2604.21931 • Published 4 days ago • 17
WorldMark: A Unified Benchmark Suite for Interactive Video World Models Paper • 2604.21686 • Published 4 days ago • 35
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation Paper • 2604.18168 • Published 7 days ago • 96
SWE-chat: Coding Agent Interactions From Real Users in the Wild Paper • 2604.20779 • Published 5 days ago • 10
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation Paper • 2604.19636 • Published 6 days ago • 85
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents Paper • 2604.18543 • Published 7 days ago • 27
MultiWorld: Scalable Multi-Agent Multi-View Video World Models Paper • 2604.18564 • Published 7 days ago • 43
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published 7 days ago • 80
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 12 days ago • 112
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 12 days ago • 152
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published 19 days ago • 94
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 17 days ago • 48
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 18 days ago • 240
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 18 days ago • 42
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 123