Spaces:

open-agent-leaderboard
/

README

Configuration error

App Files Files Community

README / README.md

Elron

Update README.md

7a27cfa verified 3 days ago

preview code

raw

history blame contribute delete

1.33 kB

metadata

title: README
emoji: 🏆
colorFrom: purple
colorTo: purple
pinned: true
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/5fc0292de45c5468456e022b/KO5DPLsX9nwW4cMz-3CkE.png
license: apache-2.0

Open Agent Leaderboard

An open benchmark for comparing full AI agent systems across diverse real-world tasks. Reports both quality and cost.

Unlike model-only benchmarks, we evaluate the complete agent — the model, the tools, the planning strategy, the error recovery — as a single system. The same model can produce very different results depending on the agent wrapped around it.

Website: exgentic.ai
Results: open-agent-leaderboard/results
Leaderboard: open-agent-leaderboard/leaderboard
Blog: open-agent-leaderboard/blog
Framework: Exgentic
Paper: arXiv:2602.22953

Submit results

Run evaluations using Exgentic and open a PR on the results dataset.