view article Article Back to The Future: Evaluating AI Agents on Predicting Future Events By vinid and 6 others • 11 days ago • 26
How to Train Your LLM Web Agent: A Statistical Diagnosis Paper • 2507.04103 • Published 23 days ago • 46
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Paper • 2406.11811 • Published Jun 17, 2024 • 16