-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2502.20545
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 125 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 22 -
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper • 2502.20545 • Published • 22
-
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Paper • 2501.07301 • Published • 100 -
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
Paper • 2502.03544 • Published • 44 -
FoNE: Precise Single-Token Number Embeddings via Fourier Features
Paper • 2502.09741 • Published • 15 -
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper • 2502.20545 • Published • 22
-
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Paper • 2502.19361 • Published • 28 -
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Paper • 2502.17407 • Published • 26 -
Small Models Struggle to Learn from Strong Reasoners
Paper • 2502.12143 • Published • 39 -
Language Models can Self-Improve at State-Value Estimation for Better Search
Paper • 2503.02878 • Published • 10
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 47 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 38 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 48
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Paper • 2502.19361 • Published • 28 -
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Paper • 2502.17407 • Published • 26 -
Small Models Struggle to Learn from Strong Reasoners
Paper • 2502.12143 • Published • 39 -
Language Models can Self-Improve at State-Value Estimation for Better Search
Paper • 2503.02878 • Published • 10
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 125 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 22 -
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper • 2502.20545 • Published • 22
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Paper • 2501.07301 • Published • 100 -
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
Paper • 2502.03544 • Published • 44 -
FoNE: Precise Single-Token Number Embeddings via Fourier Features
Paper • 2502.09741 • Published • 15 -
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper • 2502.20545 • Published • 22
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 47 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 38 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 48