Can you please submit this to leaderboard?

#15

by gblazex - opened Jan 13, 2024

Qwen org Jan 13, 2024

We (@mlabonne , @chiphuyen & me) are trying to do correlation analysis between human judgement and different benchmarks,
and the Chat version of this model is missing from hugging leaderboard.

(base model exists but it's different)

Can you guys please submit the 14B chat version to hugging leaderboard as well?

context: https://twitter.com/gblazex/status/1737574824753467647

Yhyu13

Jan 13, 2024

@gblazex Qwen needs trust remote code to be true. Which HF would not accept since its evaluation machines are not sandboxed.

gblazex

Qwen org Jan 14, 2024

@Yhyu13 that is great info thank you! So basically the tokenizer would need to be added to HuggingFace transformers library?

jklj077

Qwen org Jan 17, 2024

In fact, the modeling and tokenization both need merging for the leaderboard to work.
Currently, the base models (as foundation models) are manually run by HF staff (that's why its on the leaderboard). I don't think the chat models can enjoy the privilege though.
We plan to merge the code with transformers, but no schedule can be confirmed now.

gblazex

Qwen org Jan 17, 2024

@clefourrier can Qwen-14B-Chat get a manual run by HF stuff to get on leaderboard?

It would help us a lot in our quest to research the relationship between benchmarks,
and come up with a new representative suite based on them.

context: https://twitter.com/gblazex/status/1737574824753467647

Thank you

clefourrier

Qwen org Jan 17, 2024

Hi,
I'm sorry, we have adopted as a policy to only run foundational models manually as 1) they are the most important for the community, and 2) any manual eval is a lot of added work and we don't have the bandwidth.
However, you can follow our instructions and run the eval yourself if you need results before the code is merged.

gblazex

Qwen org Jan 17, 2024

no worries, thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment