Used as the Boss of Other Agents!
SOmeHow the best at testing !!! ohters may contain more paradigms and even data ... but somehow this one is top at leaderboard testing !
VERY GOOD MODEL !!!!! (HIGH SCORES) - 78.9 Average
@misc{open-llm-leaderboard-v2, author = {ClΓ©mentine Fourrier and Nathan Habib and Alina Lozovskaya and Konrad Szafer and Thomas Wolf}, title = {Open LLM Leaderboard v2}, year = {2024}, publisher = {Hugging Face}, howpublished = "\url{https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard}", }
@software{eval-harness, author = {Gao, Leo and Tow, Jonathan and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and McDonell, Kyle and Muennighoff, Niklas and Phang, Jason and Reynolds, Laria and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy}, title = {A framework for few-shot language model evaluation}, month = sep, year = 2021, publisher = {Zenodo}, version = {v0.0.1}, doi = {10.5281/zenodo.5371628}, url = {https://doi.org/10.5281/zenodo.5371628}, }
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 20.32 |
IFEval (0-Shot) | 43.71 |
BBH (3-Shot) | 31.70 |
MATH Lvl 5 (4-Shot) | 6.72 |
GPQA (0-shot) | 4.81 |
MuSR (0-shot) | 12.43 |
MMLU-PRO (5-shot) | 22.57 |
- Downloads last month
- 70
Model tree for LeroyDyer/LCARS_TOP_SCORE
Base model
liminerity/M7-7bSpaces using LeroyDyer/LCARS_TOP_SCORE 2
Collections including LeroyDyer/LCARS_TOP_SCORE
Evaluation results
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard43.710
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard31.700
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard6.720
- acc_norm on GPQA (0-shot)Open LLM Leaderboard4.810
- acc_norm on MuSR (0-shot)Open LLM Leaderboard12.430
- accuracy on MMLU-PRO (5-shot)test set Open LLM Leaderboard22.570