Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers Paper • 2602.03510 • Published 4 days ago • 24
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published 4 days ago • 49
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published 18 days ago • 47
FrankenMotion: Part-level Human Motion Generation and Composition Paper • 2601.10909 • Published 22 days ago • 18
FlowAct-R1: Towards Interactive Humanoid Video Generation Paper • 2601.10103 • Published 23 days ago • 73
Klear: Unified Multi-Task Audio-Video Joint Generation Paper • 2601.04151 • Published about 1 month ago • 16
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published Jan 5 • 29
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published Jan 5 • 62
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published Jan 2 • 56
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations Paper • 2512.21004 • Published Dec 24, 2025 • 13
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 95
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published Dec 18, 2025 • 11
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published Dec 18, 2025 • 20
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling Paper • 2512.15702 • Published Dec 17, 2025 • 16
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives Paper • 2512.14699 • Published Dec 16, 2025 • 28