Submitted by junkang0909 100 Quantile Advantage Estimation for Entropy-Safe Reasoning · 6 authors 9 2
Submitted by ztwang 88 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning · 9 authors 12 2
Submitted by taesiri 77 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing · 61 authors 45k 2
Submitted by hyun1905 54 ReviewScore: Misinformed Peer Review Detection with Large Language Models KAIST AI 2
Submitted by P2333 48 Language Models Can Learn from Verbal Feedback Without Scalar Rewards Sea AI Lab 2
Submitted by Wiselnn 28 CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning Intern Large Models 46 2
Submitted by lxxiao 27 MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning · 11 authors 24 3
Submitted by bltnynk 27 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping KAIST AI 2
Submitted by xl-zhao 24 PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning · 5 authors 89 5
Submitted by LordNoah 21 UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios · 18 authors 12 2
Submitted by ammarali32 20 COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning MTSAIR 2
Submitted by scikkk 19 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing LLMs for Reasoning 4 2
Submitted by yuna0x0 17 See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation · 10 authors 10 2
Submitted by Owen777 17 LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer W2GenAI Lab 63 3
Submitted by luzimu 15 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning · 8 authors 2
Submitted by yuhangzang 15 SPARK: Synergistic Policy And Reward Co-Evolving Framework Intern Large Models 16 2
Submitted by abdo-eldesokey 15 Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation · 4 authors 1 2
Submitted by wuxiaojun 14 Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval DataArcTech Ltd. 18 3
Submitted by maksimko123 12 TUN3D: Towards Real-World Scene Understanding from Unposed Images · 7 authors 11 2
Submitted by LordNoah 8 D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents · 13 authors 2
Submitted by Orannue 8 UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models · 3 authors 19 2
Submitted by JunkaiZ 8 Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training Scale AI 2
Submitted by taesiri 5 Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning · 16 authors 2
Submitted by taesiri 5 X-Streamer: Unified Human World Modeling with Audiovisual Interaction · 10 authors 3
Submitted by dyong 3 WoW: Towards a World omniscient World model Through Embodied Interaction · 36 authors 2
Submitted by taesiri 3 FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing · 7 authors 4
Submitted by je1lee 3 ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models · 8 authors 0 2
Submitted by xiangan 2 LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training · 22 authors 3
Submitted by taesiri 2 Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation · 8 authors 2
Submitted by msadat97 2 HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models · 3 authors 2
Submitted by pranjalchitale 2 The role of synthetic data in Multilingual, Multi-cultural AI systems: Lessons from Indic Languages Microsoft 2
Submitted by chen-yingfa 1 StateX: Enhancing RNN Recall via Post-training State Expansion · 6 authors 1 2
Submitted by prasannareddyp 1 X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning · 6 authors 3 2
Submitted by zhilinw 1 RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards NVIDIA 2
Submitted by rywang37 1 CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization Microsoft 2
Submitted by s-jse 1 CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition Stanford Open Virtual Assistant Lab (OVAL) 2
Submitted by Julppe1 - Finding 3D Positions of Distant Objects from Noisy Camera Movement and Semantic Segmentation Sequences · 3 authors 2
Submitted by NikolaiSkripko - Instruction-Following Evaluation in Function Calling for Large Language Models · 1 authors 1 3