Submitted by Jakumetsu 122 MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use National University of Singapore 184 3
Submitted by janchorowski 105 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Pathway 447 2
Submitted by taesiri 103 Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play · 9 authors 22 2
Submitted by Jessamine 57 Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning alibaba 3
Submitted by weizhepei 45 TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning AI at Meta 2
Submitted by Junlinh 35 Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training · 7 authors 2
Submitted by Ningyu 30 OceanGym: A Benchmark Environment for Underwater Embodied Agents Zhejiang University 28 2
Submitted by xytian1008 29 More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models · 8 authors 12 2
Submitted by xx18 26 Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners Tencent 2
Submitted by han-cai 26 DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder NVIDIA 2
Submitted by wjldw 24 Who's Your Judge? On the Detectability of LLM-Generated Judgments Data Mining and Machine Learning lab 2
Submitted by Minbyul 18 Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training Korea University 2
Submitted by Zigeng 17 dParallel: Learnable Parallel Decoding for dLLMs National University of Singapore 14 2
Submitted by hewei2001 15 VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications LongCat 3 2
Submitted by Fiaa 14 Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs Princeton University 2
Submitted by JiayiGuo821 13 IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance SHI Labs 25 2
Submitted by WENGSYX 12 DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively Text Intelligence Lab of Westlake University 6 4
Submitted by flateon 12 MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation · 5 authors 2 2
Submitted by JusperLee 12 Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention Tsinghua University 9 2
Submitted by soujanyaporia 9 OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always! Deep Cognition and Language Research (DeCLaRe) Lab 2
Submitted by linyueqian 8 Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap · 9 authors 2 2
Submitted by RyanLiu112 7 Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Tsinghua University 3
Submitted by burtenshaw 7 A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects · 4 authors 2
Submitted by taesiri 7 VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes · 9 authors 13 2
Submitted by yshenaw 7 Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective Microsoft Research 2
Submitted by kittttttt 4 Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs · 5 authors 2 2
Submitted by dtanow 3 DeepCodeSeek: Real-Time API Retrieval for Context-Aware Code Generation ServiceNow-AI 2
Submitted by dlion168 2 TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics · 15 authors 2
Submitted by sachithabey 2 EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting Nanyang Technological University Singapore 21 2
Submitted by hanxiao 2 jina-reranker-v3: Last but Not Late Interaction for Document Reranking Jina AI 2
Submitted by Kamichanw 2 d^2Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching · 7 authors 8 2
Submitted by Divij 2 BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software Cogint ASU 2
Submitted by normanpaulsen 2 Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs · 1 authors 2
Submitted by taesiri 1 Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark · 64 authors 2
Submitted by stockeh 1 Swift: An Autoregressive Consistency Model for Efficient Weather Forecasting · 3 authors 3 2
Submitted by KejiaRobust 1 MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification · 5 authors 2
Submitted by YYF42 1 CORRECT: COndensed eRror RECognition via knowledge Transfer in multi-agent systems · 7 authors 2
Submitted by Qingren 1 Estimating Time Series Foundation Model Transferability via In-Context Learning · 6 authors 2
Submitted by agneet - Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation Stability AI 2
Submitted by EdBianchi - ProfVLM: A Lightweight Video-Language Model for Multi-View Proficiency Estimation · 3 authors 2
Submitted by ZhangShenao - Learning to Reason as Action Abstractions with Scalable Mid-Training RL · 7 authors 2
Submitted by jonhue - Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models LAS @ ETH Zurich 2 2
Submitted by buxiangzhiren - GeoRemover: Removing Objects and Their Causal Visual Artifacts · 6 authors 0 2
Submitted by YuhengSSS - Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception The University of Sydney 4 2