-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2409.11355
-
Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era
Paper • 2305.06131 • Published • 2 -
Perpetual Humanoid Control for Real-time Simulated Avatars
Paper • 2305.06456 • Published • 1 -
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
Paper • 2305.10973 • Published • 37 -
LDM3D: Latent Diffusion Model for 3D
Paper • 2305.10853 • Published • 11
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 63 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 62 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 33 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 62 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 43 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 30 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 32 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 13 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 12
-
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Paper • 2402.10210 • Published • 36 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Paper • 2308.04079 • Published • 188
-
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 141 -
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs
Paper • 2409.14988 • Published • 24 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 66 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 37 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 62
-
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Paper • 2311.10708 • Published • 17 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 116 -
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 75 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32
-
Zero-shot Image Editing with Reference Imitation
Paper • 2406.07547 • Published • 34 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 27 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 26
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Paper • 2402.10210 • Published • 36 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Paper • 2308.04079 • Published • 188
-
Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era
Paper • 2305.06131 • Published • 2 -
Perpetual Humanoid Control for Real-time Simulated Avatars
Paper • 2305.06456 • Published • 1 -
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
Paper • 2305.10973 • Published • 37 -
LDM3D: Latent Diffusion Model for 3D
Paper • 2305.10853 • Published • 11
-
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 141 -
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs
Paper • 2409.14988 • Published • 24 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 63 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 62 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 66 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 37 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 62
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 33 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 62 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 43 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32
-
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Paper • 2311.10708 • Published • 17 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 116 -
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 75 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 30 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 32 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 13 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 12
-
Zero-shot Image Editing with Reference Imitation
Paper • 2406.07547 • Published • 34 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 27 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 26