Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach Paper • 2502.03639 • Published Feb 5 • 9
Scalable Ranked Preference Optimization for Text-to-Image Generation Paper • 2410.18013 • Published Oct 23, 2024 • 15
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation Paper • 2411.04967 • Published Nov 7, 2024 • 1
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Paper • 2412.09619 • Published Dec 12, 2024 • 28
Wonderland: Navigating 3D Scenes from a Single Image Paper • 2412.12091 • Published Dec 16, 2024 • 16
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device Paper • 2412.10494 • Published Dec 13, 2024 • 2
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published Jun 6, 2024 • 38
SINE: SINgle Image Editing with Text-to-Image Diffusion Models Paper • 2212.04489 • Published Dec 8, 2022
TextCraftor: Your Text Encoder Can be Image Quality Controller Paper • 2403.18978 • Published Mar 27, 2024 • 15
EfficientFormer: Vision Transformers at MobileNet Speed Paper • 2206.01191 • Published Jun 2, 2022 • 1
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models Paper • 2305.17235 • Published May 26, 2023 • 2
Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation Paper • 2206.07771 • Published Jun 15, 2022
iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis Paper • 2310.16167 • Published Oct 24, 2023 • 1
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29, 2024 • 35
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Paper • 2402.14797 • Published Feb 22, 2024 • 21