Difference between distilled and the original?

by ghosthamlet - opened Jul 22, 2022

Jul 22, 2022

Thanks for this great model.
The original model: https://huggingface.co/facebook/nllb-200-1.3B has a same size file pytorch_model.bin as this distilled version,
then what is the difference between these two model?

vitvit

Sep 22, 2022

As I understand it (from the paper) this is a 1.3B parameters model distilled from the full 54B NLLB-200 model. it gives better results then 1.3 B dense (Table 41 in the paper).

ghosthamlet

Sep 29, 2022

Thanks for the reply.

ghosthamlet changed discussion status to closed Sep 29, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment