-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper β’ 2402.04252 β’ Published β’ 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper β’ 2402.03749 β’ Published β’ 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper β’ 2402.04615 β’ Published β’ 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper β’ 2402.05008 β’ Published β’ 23
Collections
Discover the best community collections!
Collections including paper arxiv:2504.08685
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper β’ 2506.09113 β’ Published β’ 98 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper β’ 2506.08009 β’ Published β’ 26 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper β’ 2506.08279 β’ Published β’ 28 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper β’ 2506.07848 β’ Published β’ 4
-
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Paper β’ 2504.08685 β’ Published β’ 129 -
85
MegaTTS3 Demo
π -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper β’ 2501.12326 β’ Published β’ 62 -
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Paper β’ 2503.13444 β’ Published β’ 17
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper β’ 2402.04252 β’ Published β’ 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper β’ 2402.03749 β’ Published β’ 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper β’ 2402.04615 β’ Published β’ 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper β’ 2402.05008 β’ Published β’ 23
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper β’ 2506.09113 β’ Published β’ 98 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper β’ 2506.08009 β’ Published β’ 26 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper β’ 2506.08279 β’ Published β’ 28 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper β’ 2506.07848 β’ Published β’ 4
-
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Paper β’ 2504.08685 β’ Published β’ 129 -
85
MegaTTS3 Demo
π -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper β’ 2501.12326 β’ Published β’ 62 -
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Paper β’ 2503.13444 β’ Published β’ 17