The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published 4 days ago • 41
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published 4 days ago • 41
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published 12 days ago • 106
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8 • 78
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25 • 9
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published Mar 6 • 21 • 8