VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published 10 days ago • 69
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Paper • 2507.08801 • Published 16 days ago • 29
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published 14 days ago • 48
Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation Paper • 2507.05963 • Published 20 days ago • 11
VMoBA: Mixture-of-Block Attention for Video Diffusion Models Paper • 2506.23858 • Published 28 days ago • 30
StreamDiT: Real-Time Streaming Text-to-Video Generation Paper • 2507.03745 • Published 23 days ago • 28
view article Article Fine-tuning Llama 2 70B using PyTorch FSDP By smangrul and 3 others • Sep 13, 2023 • 27
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement Paper • 2506.07848 • Published Jun 9 • 4
🌞 May 2025 - Open works from the Chinese community Collection 43 items • Updated 6 days ago • 9
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published Jun 3 • 58
MAGREF: Masked Guidance for Any-Reference Video Generation Paper • 2505.23742 • Published May 29 • 9
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published May 24 • 65
SageAttention2++: A More Efficient Implementation of SageAttention2 Paper • 2505.21136 • Published May 27 • 47
ImgEdit: A Unified Image Editing Dataset and Benchmark Paper • 2505.20275 • Published May 26 • 17
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation Paper • 2505.20292 • Published May 26 • 54
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published May 17 • 38