new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Oct 16

Submitted by

foggyforest

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

HIT-TMG

HITsz-Text and Multimodal Generative Intelligence Group(TMG)

Submitted by

imlixinyang

FlashWorld: High-quality 3D Scene Generation within Seconds

·
6 authors

Submitted by

yangcole

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

alibaba-inc

1

Submitted by

sinwang

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

fnlp

OpenMOSS (SII, Fudan NLP)

Submitted by

menghao22

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

Open-Bee

Submitted by

taesiri

PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning

·
5 authors

Submitted by

tongww

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

·
26 authors

Submitted by

JakeOh

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

furiosa-ai

Submitted by

taesiri

Trace Anything: Representing Any Video in 4D via Trajectory Fields

ByteDance-Seed

Submitted by

lyclyc52

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving

·
9 authors

Submitted by

taesiri

Generative Universal Verifier as Multimodal Meta-Reasoner

ByteDance-Seed

Submitted by

Snyhlxde

Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs

·
7 authors

Submitted by

qizekun

Reasoning in Space via Grounding in the World

·
6 authors

Submitted by

taesiri

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

·
29 authors

Submitted by

HowieHwong

The Role of Computing Resources in Publishing Foundation Model Research

·
11 authors

Submitted by

Kaichengalex

UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

·
9 authors

Submitted by

zhongshsh

What Generative Search Engines Like and How to Optimize Web Content Cooperatively

·
4 authors

Submitted by

nielsr

The Art of Scaling Reinforcement Learning Compute for LLMs

Submitted by

jackyhate

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Vchitect

Submitted by

Jiakui

Universal Image Restoration Pre-training via Masked Degradation Classification

PekingUniversity

Peking University

Submitted by

2toINF

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

·
15 authors

Submitted by

taki555

Revisiting Model Interpolation for Efficient Reasoning

hkuhk

The University of Hong Kong

Submitted by

DavidLeon

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

·
8 authors

Submitted by

weizhiwang

Direct Multi-Token Decoding

·
3 authors

3

Submitted by

taesiri

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Salesforce

Submitted by

hyx21

NOSA: Native and Offloadable Sparse Attention

·
4 authors

1

Submitted by

Student-Xiaoji

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

Tsinghua University

Submitted by

YerbaPage

HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication

·
8 authors

1

Submitted by

roeiherz

Learning to Grasp Anything by Playing with Random Toys

Berkeley

Submitted by

YZCS

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

Submitted by

taicheng

MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

amazon

Submitted by

YerbaPage

GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search

·
8 authors

1

Submitted by

danjacobellis

Dedelayed: Deleting remote inference delay via on-device correction

·
5 authors

Submitted by

augustus2011

Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs

Submitted by

GarfieldX

Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain

·
10 authors

Submitted by

HankYe

KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

DukeCEICenter

Duke Center for Computational Evolutionary Intelligence (CEI)

1

Submitted by

ayshrv

Point Prompting: Counterfactual Tracking with Video Diffusion Models

·
4 authors

Submitted by

prasannamayil

MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model

·
4 authors

1

Submitted by

DanielSc4

EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling

CohereLabs

Submitted by

wyu1

Don't Throw Away Your Pretrained Model

tencent

Submitted by

kmcollins

Evaluating Language Models' Evaluations of Games

·
12 authors

Submitted by

martinagvilas

Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning

MicrosoftResearch

Microsoft Research

Submitted by

ml1996

Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation

·
13 authors