WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 4 days ago • 35
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model Paper • 2604.19747 • Published 24 days ago • 39
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 23 days ago • 240
Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents Paper • 2604.04979 • Published Apr 4 • 10
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 324
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Paper • 2604.08364 • Published Apr 9 • 101
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 628
AURA: Always-On Understanding and Real-Time Assistance via Video Streams Paper • 2604.04184 • Published Apr 5 • 50
Emergent Compositional Communication for Latent World Properties Paper • 2604.03266 • Published Mar 18 • 7
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 502
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published Mar 27 • 43