---
title: README
emoji: 🏆
colorFrom: purple
colorTo: purple
pinned: true
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/5fc0292de45c5468456e022b/KO5DPLsX9nwW4cMz-3CkE.png
license: apache-2.0
---

# Open Agent Leaderboard

An open benchmark for comparing full AI agent systems across diverse real-world tasks. Reports both quality and cost.

Unlike model-only benchmarks, we evaluate the complete agent — the model, the tools, the planning strategy, the error recovery — as a single system. The same model can produce very different results depending on the agent wrapped around it.

- **Website**: [exgentic.ai](https://www.exgentic.ai)
- **Results**: [open-agent-leaderboard/results](https://huggingface.co/datasets/open-agent-leaderboard/results)
- **Leaderboard**: [open-agent-leaderboard/leaderboard](https://huggingface.co/spaces/open-agent-leaderboard/leaderboard)
- **Blog**: [open-agent-leaderboard/blog](https://huggingface.co/spaces/open-agent-leaderboard/blog)
- **Framework**: [Exgentic](https://github.com/Exgentic/exgentic)
- **Paper**: [arXiv:2602.22953](https://arxiv.org/abs/2602.22953)

## Submit results

Run evaluations using [Exgentic](https://github.com/Exgentic/exgentic) and open a PR on the [results dataset](https://huggingface.co/datasets/open-agent-leaderboard/results).