Add evaluation results: HLE, GPQA Diamond, SWE-bench Verified, Terminal-Bench 2.0

#2
by SaylorTwift HF Staff - opened
No description provided.

+1. What was the agent used in Terminal-Bench 2.0?

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment