-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2604.28185
-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
Paper • 2605.00658 • Published • 84 -
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Paper • 2604.28185 • Published • 90 -
Representation Fréchet Loss for Visual Generation
Paper • 2604.28190 • Published • 31 -
Co-Evolving Policy Distillation
Paper • 2604.27083 • Published • 65
-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 195
-
Parallelized Autoregressive Visual Generation
Paper • 2412.15119 • Published • 53 -
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
Paper • 2412.17153 • Published • 39 -
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
Paper • 2412.18609 • Published • 17 -
Visual Autoregressive Modeling for Instruction-Guided Image Editing
Paper • 2508.15772 • Published • 10
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
Paper • 2605.00658 • Published • 84 -
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Paper • 2604.28185 • Published • 90 -
Representation Fréchet Loss for Visual Generation
Paper • 2604.28190 • Published • 31 -
Co-Evolving Policy Distillation
Paper • 2604.27083 • Published • 65
-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 195
-
Parallelized Autoregressive Visual Generation
Paper • 2412.15119 • Published • 53 -
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
Paper • 2412.17153 • Published • 39 -
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
Paper • 2412.18609 • Published • 17 -
Visual Autoregressive Modeling for Instruction-Guided Image Editing
Paper • 2508.15772 • Published • 10