which GPQA subset is uesd

#10

by maybe10086 - opened 5 days ago

5 days ago

Hello! While reading the Qwen2 technical report, I noticed that Qwen2 achieved excellent results on the GPQA benchmark. I'm wondering which specific subset of GPQA was used in the evaluation? Was it diamond, main, extended, or experts? Since the GPQA dataset has different subsets, knowing exactly which one was used would help us better understand the model's capabilities and make fair comparisons.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment