OpenEvals
community
AI & ML interests
LLM evaluation
Recent Activity
View all activity
A small overview of our research collabs through the years
-
GAIA: a benchmark for General AI Assistants
Paper β’ 2311.12983 β’ Published β’ 225 -
Zephyr: Direct Distillation of LM Alignment
Paper β’ 2310.16944 β’ Published β’ 122 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper β’ 2502.02737 β’ Published β’ 240 -
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Paper β’ 2412.03304 β’ Published β’ 19
This leaderboard evaluated 7K LLMs from Apr 2023 to Jun 2024, on ARC-c, HellaSwag, MMLU, TruthfulQA, Winogrande and GSM8K
This leaderboard has been evaluating LLMs from Jun 2024 on IFEval, MuSR, GPQA, MATH, BBH and MMLU-Pro
-
122
Open-LLM performances are plateauing, letβs make the leaderboard steep again
πUpdate leaderboard for fair model evaluation
-
13.3k
Open LLM Leaderboard
πTrack, rank and evaluate open LLMs and chatbots
-
open-llm-leaderboard/contents
Viewer β’ Updated β’ 4.58k β’ 10.1k β’ 18 -
open-llm-leaderboard/results
Preview β’ Updated β’ 17.4k β’ 15
This leaderboard has been evaluating LLMs from Jun 2024 on IFEval, MuSR, GPQA, MATH, BBH and MMLU-Pro
-
122
Open-LLM performances are plateauing, letβs make the leaderboard steep again
πUpdate leaderboard for fair model evaluation
-
13.3k
Open LLM Leaderboard
πTrack, rank and evaluate open LLMs and chatbots
-
open-llm-leaderboard/contents
Viewer β’ Updated β’ 4.58k β’ 10.1k β’ 18 -
open-llm-leaderboard/results
Preview β’ Updated β’ 17.4k β’ 15
A small overview of our research collabs through the years
-
GAIA: a benchmark for General AI Assistants
Paper β’ 2311.12983 β’ Published β’ 225 -
Zephyr: Direct Distillation of LM Alignment
Paper β’ 2310.16944 β’ Published β’ 122 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper β’ 2502.02737 β’ Published β’ 240 -
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Paper β’ 2412.03304 β’ Published β’ 19
This leaderboard evaluated 7K LLMs from Apr 2023 to Jun 2024, on ARC-c, HellaSwag, MMLU, TruthfulQA, Winogrande and GSM8K