-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2506.16054
-
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Paper • 2505.18875 • Published • 42 -
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Paper • 2506.16054 • Published • 60 -
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Paper • 2410.02367 • Published • 51 -
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation
Paper • 2506.19852 • Published • 40
-
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Paper • 2504.07951 • Published • 29 -
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Paper • 2504.08003 • Published • 49 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 29 -
Towards Learning to Complete Anything in Lidar
Paper • 2504.12264 • Published • 10
-
1.58-bit FLUX
Paper • 2412.18653 • Published • 85 -
Region-Adaptive Sampling for Diffusion Transformers
Paper • 2502.10389 • Published • 54 -
One-step Diffusion Models with f-Divergence Distribution Matching
Paper • 2502.15681 • Published • 8 -
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Paper • 2502.20126 • Published • 20
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 11 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 30 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 15 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 98 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 153
-
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Paper • 2503.18446 • Published • 12 -
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Paper • 2503.20240 • Published • 22 -
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Paper • 2503.20672 • Published • 14 -
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Paper • 2503.20198 • Published • 4
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 11 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 30 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Paper • 2505.18875 • Published • 42 -
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Paper • 2506.16054 • Published • 60 -
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Paper • 2410.02367 • Published • 51 -
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation
Paper • 2506.19852 • Published • 40
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 15 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 98 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 153
-
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Paper • 2504.07951 • Published • 29 -
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Paper • 2504.08003 • Published • 49 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 29 -
Towards Learning to Complete Anything in Lidar
Paper • 2504.12264 • Published • 10
-
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Paper • 2503.18446 • Published • 12 -
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Paper • 2503.20240 • Published • 22 -
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Paper • 2503.20672 • Published • 14 -
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Paper • 2503.20198 • Published • 4
-
1.58-bit FLUX
Paper • 2412.18653 • Published • 85 -
Region-Adaptive Sampling for Diffusion Transformers
Paper • 2502.10389 • Published • 54 -
One-step Diffusion Models with f-Divergence Distribution Matching
Paper • 2502.15681 • Published • 8 -
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Paper • 2502.20126 • Published • 20