SNOW-NLP/snow_simplified_japanese_corpus
Updated • 226 • 21
How to use ybelkada/japanese-dummy-tokenizer with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ybelkada/japanese-dummy-tokenizer", dtype="auto")Repository containing a dummy Japanese Tokenizer trained on snow_simplified_japanese_corpus dataset. The tokenizer has been trained using Hugging Face datasets in a streaming manner.
You can use this tokenizer to tokenize Japanese sentences.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("ybelkada/japanese-dummy-tokenizer")
Check the file tokenizer.py, you can freely adapt it to other datasets. This tokenizer is based on the tokenizer from csebuetnlp/mT5_multilingual_XLSum.