Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Paper • 2507.07095 • Published 22 days ago • 53
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset Paper • 2507.03483 • Published 27 days ago • 23
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning Paper • 2506.16141 • Published Jun 19 • 27
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning Paper • 2506.09049 • Published Jun 10 • 36
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21 • 62
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper • 2503.16408 • Published Mar 20 • 41 • 2
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper • 2503.16408 • Published Mar 20 • 41
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception Paper • 2312.07472 • Published Dec 12, 2023 • 2
SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection Paper • 2309.07084 • Published Sep 13, 2023 • 1
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control Paper • 2403.12037 • Published Mar 18, 2024 • 1
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published Oct 23, 2024 • 20
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published Jan 14 • 68
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation Paper • 2501.12612 • Published Jan 22
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper • 2503.16408 • Published Mar 20 • 41
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper • 2501.04698 • Published Jan 8 • 15
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Paper • 2503.14487 • Published Mar 18 • 27