Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO Paper • 2511.16669 • Published 12 days ago • 31
Running on CPU Upgrade Featured 2.5k The Smol Training Playbook 📚 2.5k The secrets to building world-class LLMs
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published 28 days ago • 51
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published 26 days ago • 207
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28 • 70
Running Featured 186 Qwen3 Omni Demo ⚡ 186 Interact with a multimodal chatbot using text, audio, images, or video
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models Paper • 2510.13626 • Published Oct 15 • 45