Submitted by lastdefiance20 101 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI · 10 authors 22 1
Submitted by KangLiao 86 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation · 9 authors 76 1
Submitted by hyeoncho01 38 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling · 6 authors 5 2
Submitted by YuminChoi 35 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs KAIST AI 4 1
Submitted by taesiri 22 StreamingVLM: Real-Time Understanding for Infinite Video Streams · 7 authors 112 1
Submitted by weirayao 22 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Salesforce 43 1
Submitted by taesiri 21 BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution BigCode 37 2
Submitted by lulululuyi 21 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? LongCat 11 1
Submitted by taesiri 16 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km · 11 authors 23 2
Submitted by yqi19 16 BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities Peking University 14 1
Submitted by Yunzhen 12 Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting · 5 authors 1
Submitted by JoeYing 11 ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping · 10 authors 1
Submitted by arubique 11 DISCO: Diversifying Sample Condensation for Efficient Model Evaluation Eberhard Karls Universität Tübingen 1
Submitted by arashmarioriyad 9 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization · 6 authors 1
Submitted by Kurt232 7 Which Heads Matter for Reasoning? RL-Guided KV Cache Compression · 5 authors 1
Submitted by yanchi3dv 7 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction · 2 authors 15 1
Submitted by Leo-Dai 6 StatEval: A Comprehensive Benchmark for Large Language Models in Statistics Shanghai University of Finance and Economics 1
Submitted by siyue 6 MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval · 8 authors 1
Submitted by jasonyux 5 Dyna-Mind: Learning to Simulate from Experience for Better AI Agents · 9 authors 1
Submitted by cmhungsteve 5 TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control · 7 authors 1
Submitted by taesiri 5 PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs · 9 authors 14 1
Submitted by jacksukk 5 Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition · 7 authors 1
Submitted by dd101bb 5 Parallel Test-Time Scaling for Latent Reasoning Models The Hong Kong Polytechnic University 1 1
Submitted by kotekjedi 4 Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols CLAIRE Lab @EPFL 1
Submitted by demfier 4 ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review · 4 authors 1
Submitted by Rbin 4 LightReasoner: Can Small Language Models Teach Large Language Models Reasoning? Data Intelligence Lab@HKU 1
Submitted by taesiri 3 Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models · 12 authors 8 1
Submitted by ssz1111 3 A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks · 8 authors 1
Submitted by Ruggero1912 3 One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework · 6 authors 7 1
Submitted by tytyt 2 Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation · 10 authors 1
Submitted by zsqzz 2 GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare University of Illinois at Urbana-Champaign 0 2
Submitted by LawrenceLiu 2 ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization University of California, Los Angeles 1
Submitted by WenyaoZhang 1 Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation · 10 authors 5 1
Submitted by Jessemel 1 How to Teach Large Multimodal Models New Skills University of Illinois at Urbana-Champaign 13 1
Submitted by nielsr 1 Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models Massachusetts Institute of Technology 27 1
Submitted by EasonFan 1 ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall · 8 authors 1
Submitted by Sajib-006 1 LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology Virginia Polytechnic Institute and State University 1
Submitted by jlbaker361 1 MONKEY: Masking ON KEY-Value Activation Adapter for Personalization · 1 authors 1
Submitted by cmhungsteve 1 Temporal Prompting Matters: Rethinking Referring Video Object Segmentation · 6 authors 1
Submitted by avanturist 1 ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL · 3 authors 14 1