Submitted by foggyforest 56 UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE HITsz-Text and Multimodal Generative Intelligence Group(TMG) 780 1
Submitted by imlixinyang 55 FlashWorld: High-quality 3D Scene Generation within Seconds · 6 authors 149 1
Submitted by yangcole 45 Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization alibaba-inc 1
Submitted by sinwang 38 LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models OpenMOSS (SII, Fudan NLP) 27 4
Submitted by menghao22 36 Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs Open-Bee 0 1
Submitted by taesiri 31 PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning · 5 authors 19 1
Submitted by tongww 28 InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue · 26 authors 1
Submitted by JakeOh 24 ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs FuriosaAI 8 1
Submitted by taesiri 23 Trace Anything: Representing Any Video in 4D via Trajectory Fields ByteDance Seed 119 1
Submitted by lyclyc52 23 CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving · 9 authors 1
Submitted by taesiri 22 Generative Universal Verifier as Multimodal Meta-Reasoner ByteDance Seed 13 1
Submitted by Snyhlxde 19 Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs · 7 authors 9 1
Submitted by taesiri 13 InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy · 29 authors 159 1
Submitted by HowieHwong 13 The Role of Computing Resources in Publishing Foundation Model Research · 11 authors 1
Submitted by Kaichengalex 10 UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning · 9 authors 24 1
Submitted by zhongshsh 10 What Generative Search Engines Like and How to Optimize Web Content Cooperatively · 4 authors 0 2
Submitted by jackyhate 9 Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark Vchitect 9 1
Submitted by Jiakui 9 Universal Image Restoration Pre-training via Masked Degradation Classification Peking University 8 1
Submitted by 2toINF 9 X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model · 15 authors 32 1
Submitted by taki555 8 Revisiting Model Interpolation for Efficient Reasoning The University of Hong Kong 2 5
Submitted by DavidLeon 8 FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model · 8 authors 1
Submitted by taesiri 4 Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math Salesforce 3 1
Submitted by Student-Xiaoji 4 CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving Tsinghua University 10 1
Submitted by YerbaPage 4 HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication · 8 authors 1
Submitted by YZCS 2 Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention HKUSTGZ 6 1
Submitted by taicheng 2 MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training Amazon 2 1
Submitted by YerbaPage 2 GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search · 8 authors 1
Submitted by danjacobellis 1 Dedelayed: Deleting remote inference delay via on-device correction · 5 authors 1
Submitted by augustus2011 1 Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs Character-lab 0 1
Submitted by GarfieldX 1 Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain · 10 authors 0 1
Submitted by HankYe 1 KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems Duke Center for Computational Evolutionary Intelligence (CEI) 1
Submitted by ayshrv 1 Point Prompting: Counterfactual Tracking with Video Diffusion Models · 4 authors 1
Submitted by prasannamayil 1 MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model · 4 authors 1
Submitted by DanielSc4 1 EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling Cohere Labs 0 1
Submitted by martinagvilas - Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning Microsoft Research 1
Submitted by ml1996 - Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation · 13 authors 6 1