GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published 28 days ago • 202
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes Paper • 2504.15270 • Published Apr 21 • 10
ExpLLM: Towards Chain of Thought for Facial Expression Recognition Paper • 2409.02828 • Published Sep 4, 2024
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29, 2024 • 58
LVBench: An Extreme Long Video Understanding Benchmark Paper • 2406.08035 • Published Jun 12, 2024 • 1
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations Paper • 2402.04236 • Published Feb 6, 2024 • 8
LongAlign: A Recipe for Long Context Alignment of Large Language Models Paper • 2401.18058 • Published Jan 31, 2024 • 23
KoLA: Carefully Benchmarking World Knowledge of Large Language Models Paper • 2306.09296 • Published Jun 15, 2023 • 19
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation Paper • 2303.14655 • Published Mar 26, 2023