Why not use PreTrainedTokenizerFast
#22
by
zymu
- opened
In tokenization_kimi.py, the base class is PreTrainedTokenizer. Seems it is pure python implementation. Why not using PreTrainedTokenizerFast as the base class, will it affect E2E performance?
Also the python tokenizer is not sufficient with lm harness benchmark