Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.15007

about 6 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

Paper • 2401.01275 • Published Jan 2, 2024 • 1
Evaluating Very Long-Term Conversational Memory of LLM Agents

Paper • 2402.17753 • Published Feb 27, 2024 • 20
PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering

Paper • 2402.16288 • Published Feb 26, 2024 • 1
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Paper • 2502.14802 • Published Feb 20 • 13

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Paper • 2502.15499 • Published Feb 21 • 14
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

Paper • 2502.17422 • Published Feb 24 • 7
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Paper • 2502.17535 • Published Feb 24 • 8
Scaling LLM Pre-training with Vocabulary Curriculum

Paper • 2502.17910 • Published Feb 25 • 1

收集的感兴趣的AI

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 193
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 105
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20 • 91
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published Feb 20 • 20

about 24 hours ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 35
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 28
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 23

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

Paper • 2504.05535 • Published Apr 7 • 44
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20 • 175

You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 38
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20 • 175
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 167
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3 • 32

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20 • 175

Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published Jan 29 • 24
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Paper • 2502.11831 • Published Feb 17 • 20
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 73

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 49
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 623
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 72
Humanoid Locomotion as Next Token Prediction

Paper • 2402.19469 • Published Feb 29, 2024 • 29

about 6 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

Paper • 2504.05535 • Published Apr 7 • 44
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20 • 175

CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

Paper • 2401.01275 • Published Jan 2, 2024 • 1
Evaluating Very Long-Term Conversational Memory of LLM Agents

Paper • 2402.17753 • Published Feb 27, 2024 • 20
PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering

Paper • 2402.16288 • Published Feb 26, 2024 • 1
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Paper • 2502.14802 • Published Feb 20 • 13

You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 38
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20 • 175
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 167
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3 • 32

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Paper • 2502.15499 • Published Feb 21 • 14
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

Paper • 2502.17422 • Published Feb 24 • 7
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Paper • 2502.17535 • Published Feb 24 • 8
Scaling LLM Pre-training with Vocabulary Curriculum

Paper • 2502.17910 • Published Feb 25 • 1

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20 • 175

收集的感兴趣的AI

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 193
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 105
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20 • 91
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published Feb 20 • 20

Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published Jan 29 • 24
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Paper • 2502.11831 • Published Feb 17 • 20
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 73

about 24 hours ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 35
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 28
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 23

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 49
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 623
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 72
Humanoid Locomotion as Next Token Prediction

Paper • 2402.19469 • Published Feb 29, 2024 • 29

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs