Running 20 Rabbits Leaderboard 💊 20 Visualize and analyze language model robustness to drug name synonyms
Running on CPU Upgrade 13.7k Open LLM Leaderboard 🏆 13.7k Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade 237 MMLU-Pro Leaderboard 🥇 237 More advanced and challenging multi-task evaluation