DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation Paper • 2307.01831 • Published Jul 4, 2023 • 8
Mask-Attention-Free Transformer for 3D Instance Segmentation Paper • 2309.01692 • Published Sep 4, 2023 • 1
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17 • 41
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26 • 55
IterPref: Focal Preference Learning for Code Generation via Iterative Debugging Paper • 2503.02783 • Published Mar 4 • 6
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published Dec 12, 2024 • 21