UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published Jun 3 • 58
MAGREF: Masked Guidance for Any-Reference Video Generation Paper • 2505.23742 • Published May 29 • 9
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation Paper • 2505.20292 • Published May 26 • 54
ImgEdit: A Unified Image Editing Dataset and Benchmark Paper • 2505.20275 • Published May 26 • 17
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published Apr 3 • 58
MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation Paper • 2503.14428 • Published Mar 18 • 9
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Paper • 2503.10391 • Published Mar 13 • 11
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model Paper • 2411.17459 • Published Nov 26, 2024 • 11
Open-Sora Plan: Open-Source Large Video Generation Model Paper • 2412.00131 • Published Nov 28, 2024 • 34
Identity-Preserving Text-to-Video Generation by Frequency Decomposition Paper • 2411.17440 • Published Nov 26, 2024 • 38
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published Sep 2, 2024 • 14
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators Paper • 2404.05014 • Published Apr 7, 2024 • 35
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation Paper • 2406.18522 • Published Jun 26, 2024 • 21