Why not use PreTrainedTokenizerFast

#22

by zymu - opened Jul 16

zymu

Jul 16

In tokenization_kimi.py, the base class is PreTrainedTokenizer. Seems it is pure python implementation. Why not using PreTrainedTokenizerFast as the base class, will it affect E2E performance?

xhejtman

Jul 19

Also the python tokenizer is not sufficient with lm harness benchmark

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment