Captain Cinema: Towards Short Movie Generation Paper • 2507.18634 • Published 3 days ago • 31 • 3
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published 10 days ago • 69
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Paper • 2507.08801 • Published 16 days ago • 29
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published 14 days ago • 48
Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation Paper • 2507.05963 • Published 20 days ago • 11
Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation Paper • 2507.05963 • Published 20 days ago • 11 • 2
VMoBA: Mixture-of-Block Attention for Video Diffusion Models Paper • 2506.23858 • Published 28 days ago • 30
StreamDiT: Real-Time Streaming Text-to-Video Generation Paper • 2507.03745 • Published 23 days ago • 28
view article Article Fine-tuning Llama 2 70B using PyTorch FSDP By smangrul and 3 others • Sep 13, 2023 • 27
Running on Zero 88 88 VLM Object Understanding 🦀 Explore object detection, visual grounding, keypoint Detecti
black-forest-labs/FLUX.1-Kontext-dev Image-to-Image • Updated about 1 month ago • 370k • • 1.87k