Submitted by tjpxiaoming 298 Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems · 47 authors 1.57k 7
Submitted by KennyUTC 69 Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing · 10 authors 79 2
Submitted by BestWishYsh 58 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation · 10 authors 285 3
Submitted by scofield7419 57 JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization · 11 authors 79 4
Submitted by danxuhk 49 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation · 8 authors 351 9
Submitted by ManTle 32 Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme · 5 authors 3
Submitted by gallilmaimon 32 Scaling Analysis of Interleaved Speech-Text Language Models · 4 authors 215 2
Submitted by yuanqianhao 25 ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers · 9 authors 10 2
Submitted by Franck-Dernoncourt 17 Efficient Model Selection for Time Series Forecasting via LLMs · 7 authors 2
Submitted by smajumdar94 16 OpenCodeReasoning: Advancing Data Distillation for Competitive Coding · 8 authors 3
Submitted by RyanLiu112 14 GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning · 11 authors 80 3
Submitted by tuphs 13 Interpreting Emergent Planning in Model-Free Reinforcement Learning · 5 authors 2
Submitted by universea 13 Scaling Laws in Scientific Discovery with AI and Robot Scientists · 10 authors 85 2
Submitted by Falcary 11 NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations · 9 authors 2
Submitted by shyamgopal 10 Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models · 5 authors 24 2
Submitted by zuazo 10 Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages · 4 authors 31 3
Submitted by bedio 6 Instruction-Guided Autoregressive Neural Network Parameter Generation · 4 authors 2