Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.01006

GLM-4.1V-Thinking

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206
zai-org/GLM-4.1V-9B-Thinking

Image-Text-to-Text • 10B • Updated 24 days ago • 113k • • 685
zai-org/GLM-4.1V-9B-Base

Image-Text-to-Text • 10B • Updated 29 days ago • 3.79k • 38
Running

27

27

GLM-4.1V-9B-Thinking-API-Demo

🚀

THUDM/GLM-4.1V-9B-Thinking Demo

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 31
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published 28 days ago • 142
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 84
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 80
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206

Multimodal Reasoning

about 17 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 8
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 40

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published 28 days ago • 142
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206

GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies

Paper • 2506.14477 • Published Jun 17
VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3 • 50
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27 • 63

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 39
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published 29 days ago • 24
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published 30 days ago • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published about 1 month ago • 22

GLM-4.1V-Thinking

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206
zai-org/GLM-4.1V-9B-Thinking

Image-Text-to-Text • 10B • Updated 24 days ago • 113k • • 685
zai-org/GLM-4.1V-9B-Base

Image-Text-to-Text • 10B • Updated 29 days ago • 3.79k • 38
Running

27

27

GLM-4.1V-9B-Thinking-API-Demo

🚀

THUDM/GLM-4.1V-9B-Thinking Demo

Multimodal Reasoning

about 17 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 8
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 40

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 31
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published 28 days ago • 142
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published 28 days ago • 142
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206

GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies

Paper • 2506.14477 • Published Jun 17
VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3 • 50
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27 • 63

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 84
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 80
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published about 1 month ago • 206

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 39
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published 29 days ago • 24
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published 30 days ago • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published about 1 month ago • 22

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs