-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 45 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2503.10633
-
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Paper • 2304.09842 • Published • 2 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 27 -
Gorilla: Large Language Model Connected with Massive APIs
Paper • 2305.15334 • Published • 5 -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper • 2303.11366 • Published • 5
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 130 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 59 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 71
-
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
Paper • 2410.09008 • Published • 17 -
answerdotai/ModernBERT-base
Fill-Mask • 0.1B • Updated • 1.42M • 932 -
answerdotai/ModernBERT-large
Fill-Mask • 0.4B • Updated • 1.26M • 422 -
microsoft/phi-4
Text Generation • 15B • Updated • 1.19M • 2.16k
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 45 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 24
-
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Paper • 2304.09842 • Published • 2 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 27 -
Gorilla: Large Language Model Connected with Massive APIs
Paper • 2305.15334 • Published • 5 -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper • 2303.11366 • Published • 5
-
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
Paper • 2410.09008 • Published • 17 -
answerdotai/ModernBERT-base
Fill-Mask • 0.1B • Updated • 1.42M • 932 -
answerdotai/ModernBERT-large
Fill-Mask • 0.4B • Updated • 1.26M • 422 -
microsoft/phi-4
Text Generation • 15B • Updated • 1.19M • 2.16k
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 130 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 59 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 71
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49