-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2502.15007
-
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
Paper • 2401.01275 • Published • 1 -
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 20 -
PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering
Paper • 2402.16288 • Published • 1 -
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
Paper • 2502.14802 • Published • 13
-
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Paper • 2502.15499 • Published • 14 -
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Paper • 2502.17422 • Published • 7 -
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Paper • 2502.17535 • Published • 8 -
Scaling LLM Pre-training with Vocabulary Curriculum
Paper • 2502.17910 • Published • 1
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 193 -
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper • 2502.14739 • Published • 105 -
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 91 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 28 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 23
-
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 38 -
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Paper • 2502.15007 • Published • 175 -
Transformers without Normalization
Paper • 2503.10622 • Published • 167 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 73
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 49 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 623 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 72 -
Humanoid Locomotion as Next Token Prediction
Paper • 2402.19469 • Published • 29
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
Paper • 2401.01275 • Published • 1 -
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 20 -
PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering
Paper • 2402.16288 • Published • 1 -
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
Paper • 2502.14802 • Published • 13
-
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 38 -
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Paper • 2502.15007 • Published • 175 -
Transformers without Normalization
Paper • 2503.10622 • Published • 167 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32
-
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Paper • 2502.15499 • Published • 14 -
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Paper • 2502.17422 • Published • 7 -
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Paper • 2502.17535 • Published • 8 -
Scaling LLM Pre-training with Vocabulary Curriculum
Paper • 2502.17910 • Published • 1
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 193 -
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper • 2502.14739 • Published • 105 -
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 91 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 73
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 28 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 23
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 49 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 623 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 72 -
Humanoid Locomotion as Next Token Prediction
Paper • 2402.19469 • Published • 29