-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2502.14282
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 193 -
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper • 2502.14739 • Published • 105 -
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 91 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20
-
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper • 2501.13200 • Published • 69 -
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Paper • 2502.09604 • Published • 36 -
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Paper • 2502.10458 • Published • 37 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 52 -
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 88 -
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 1
-
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration
Paper • 2502.17110 • Published • 13 -
WebGames: Challenging General-Purpose Web-Browsing AI Agents
Paper • 2502.18356 • Published • 13 -
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Paper • 2502.18906 • Published • 12 -
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
Paper • 2503.02268 • Published • 11
-
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20 -
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
Paper • 2502.16111 • Published • 9 -
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Paper • 2503.06580 • Published • 19 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 8
-
Agents Thinking Fast and Slow: A Talker-Reasoner Architecture
Paper • 2410.08328 • Published -
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
Paper • 2305.17390 • Published • 3 -
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper • 2501.13200 • Published • 69 -
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems
Paper • 2502.11098 • Published • 13
-
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 27 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 31 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 32 -
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 31
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 1
-
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration
Paper • 2502.17110 • Published • 13 -
WebGames: Challenging General-Purpose Web-Browsing AI Agents
Paper • 2502.18356 • Published • 13 -
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Paper • 2502.18906 • Published • 12 -
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
Paper • 2503.02268 • Published • 11
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 193 -
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper • 2502.14739 • Published • 105 -
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 91 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20
-
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20 -
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
Paper • 2502.16111 • Published • 9 -
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Paper • 2503.06580 • Published • 19 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 8
-
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper • 2501.13200 • Published • 69 -
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Paper • 2502.09604 • Published • 36 -
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Paper • 2502.10458 • Published • 37 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20
-
Agents Thinking Fast and Slow: A Talker-Reasoner Architecture
Paper • 2410.08328 • Published -
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
Paper • 2305.17390 • Published • 3 -
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper • 2501.13200 • Published • 69 -
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems
Paper • 2502.11098 • Published • 13
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 52 -
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 88 -
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1
-
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 27 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 31 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 32 -
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 31