Automating Updates for new models?

#11
by nahsor - opened

There are several new models (Claude 4.5 series, GPT 5, Deepseek V3.1 Terminus, GLM 4.6 etc.) that are missing.

Is it possible to automate this leaderboard so that new models get evaluated and listed automatically?

Hi thanks for bringing it up! We plan to add them soon. I have encountered unusual issues with models and providers that are challenging to anticipate, so I prefer running the scripts and performing manual checks before adding the result to the leaderboard.

It's been running delayed as I am occupied with other priorities.

Sign up or log in to comment