Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks Paper • 2604.17761 • Published 4 days ago • 3
Meta-Harness: End-to-End Optimization of Model Harnesses Paper • 2603.28052 • Published 25 days ago • 19
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems Paper • 2512.06749 • Published Dec 7, 2025 • 28
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems Paper • 2512.06749 • Published Dec 7, 2025 • 28 • 4
An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published Jan 9, 2025 • 41
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 97
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Paper • 2411.06559 • Published Nov 10, 2024 • 16
Sharingan: Extract User Action Sequence from Desktop Recordings Paper • 2411.08768 • Published Nov 13, 2024 • 9
Sharingan: Extract User Action Sequence from Desktop Recordings Paper • 2411.08768 • Published Nov 13, 2024 • 9 • 2