Submitted by AaronHuangWei 103 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs NVIDIA 184 1
Submitted by CheeryLJH 39 OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs NJU-LINK Lab 20 1
Submitted by Monta3Pt 37 Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States King's College London 2
Submitted by JingHaoZ 31 RLFR: Extending Reinforcement Learning for LLMs with Flow Environment · 7 authors 33 1
Submitted by Xiaoye08 31 Spotlight on Token Perception for Multimodal Reinforcement Learning · 7 authors 24 3
Submitted by fenghora 26 DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training · 5 authors 50 3
Submitted by DogNeverSleep 26 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration · 12 authors 3 2
Submitted by wenhu 24 BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions TIGER-Lab 6 2
Submitted by Lingaaaaaaa 23 Demystifying Reinforcement Learning in Agentic Reasoning · 5 authors 12 1
Submitted by HowieHwong 22 Building a Foundational Guardrail for General Agentic Systems via Synthetic Data · 14 authors 26 2
Submitted by KiyotakaWang 21 InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models InternSVG 31 2
Submitted by JinChengRen 20 ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems OPPO-Personal-AI-Lab 1
Submitted by jeepliu 19 DocReward: A Document Reward Model for Structuring and Stylizing Microsoft Research 2
Submitted by YanAdjeNole 19 FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs The Fin AI 1 1
Submitted by taesiri 17 GIR-Bench: Versatile Benchmark for Generating Images with Reasoning HKUST 23 3
Submitted by lyabc 16 AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes Kuaishou Visual Generation and Interaction Center 2
Submitted by ganlinyang 15 Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning shanghai ailab 15 1
Submitted by wangchy 14 SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models Meta Research 9 1
Submitted by LucasFang 13 CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images The University of Hong Kong 22 1
Submitted by Agorium 13 On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models Seoul National University 3 1
Submitted by huangsiteng 10 High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting DAMO Academy 1
Submitted by xxzcc 9 ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding Tencent 231 2
Submitted by isaacchung 7 HUME: Measuring the Human-Model Performance Gap in Text Embedding Task Massive Text Embedding Benchmark 2
Submitted by Albus-Chen 7 PEAR: Phase Entropy Aware Reward for Efficient Reasoning iNLP Lab @ SUTD 0 2
Submitted by SoroushMehraban 6 FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding Vector Institute 2
Submitted by wymanCV 6 Stable Video Infinity: Infinite-Length Video Generation with Error Recycling EPFL VITA Lab 113 2
Submitted by taesiri 5 LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference · 8 authors 1
Submitted by Jan150000 5 SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and Adaptive Reasoning · 10 authors 1
Submitted by xwjzds 5 The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs Amazon 4
Submitted by RickyDeSkywalker 4 GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving · 5 authors 1
Submitted by eaglew 4 oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning · 5 authors 1 4
Submitted by abenechehab 4 From Data to Rewards: a Bilevel Optimization Perspective on Maximum Likelihood Estimation · 8 authors 1 1
Submitted by jroh 4 World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge · 5 authors 2
Submitted by yuzc19 3 RePro: Training Language Models to Faithfully Recycle the Web for Pretraining Chenyan Xiong Research Group at CMU 1 1
Submitted by FeYuan 3 LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning · 6 authors 2
Submitted by iliashum 3 The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections · 14 authors 2
Submitted by Liang-ZX 3 VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing UC Berkeley 1
Submitted by taesiri 2 IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment · 10 authors 2
Submitted by liuganghuggingface 2 Graph Diffusion Transformers are In-Context Molecular Designers · 7 authors 4 2
Submitted by fz-rit-hf 2 Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-Aware Annotation Pipeline for Terrestrial Point Cloud Segmentation · 7 authors 2
Submitted by SipengZ 2 A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining University of California San Diego 0 1
Submitted by beckhamchen 1 AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model OPPO 2
Submitted by Ricky06662 1 ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models · 7 authors 2
Submitted by Neo111x 1 The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution · 5 authors 0 1
Submitted by kargaranamir 1 CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs · 3 authors 1 1
Submitted by zhihuang 1 Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior Zhi Huang Lab 2
Submitted by saadob12 - The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers WüNLP 0 1
Submitted by shellygolan - VLM-Guided Adaptive Negative Prompting for Creative Generation · 4 authors 1