-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2505.10554
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Chain-of-Model Learning for Language Model
Paper • 2505.11820 • Published • 121 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper • 2505.04620 • Published • 83 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 23 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 68 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Paper • 2503.21614 • Published • 42 -
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 63 -
JudgeLRM: Large Reasoning Models as a Judge
Paper • 2504.00050 • Published • 62 -
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper • 2504.05599 • Published • 86
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 267 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 176 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 133 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 96
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Paper • 2505.09694 • Published • 19 -
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 23
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.41k • 1.14k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 14 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 813 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 105
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Paper • 2503.21614 • Published • 42 -
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 63 -
JudgeLRM: Large Reasoning Models as a Judge
Paper • 2504.00050 • Published • 62 -
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper • 2504.05599 • Published • 86
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 267 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 176 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 133 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 96
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Chain-of-Model Learning for Language Model
Paper • 2505.11820 • Published • 121 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper • 2505.04620 • Published • 83 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Paper • 2505.09694 • Published • 19 -
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 23
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 23 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 68 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.41k • 1.14k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 14 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 813 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 105