Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 13 days ago • 140
Macaron-A2UI: A Model for Generative UI in Personal Agents Paper • 2605.24830 • Published 17 days ago • 81
Running 49 physics-intern: an Autonomous Agent for Physics Research 📝 49 Explore an autonomous AI workflow for physics research
Lightning Unified Video Editing via In-Context Sparse Attention Paper • 2605.04569 • Published May 6 • 18
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 166
EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model Paper • 2604.10268 • Published Apr 11 • 12
view article Article Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents nvidia • Apr 28 • 62
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118