Namuun123's picture
Upload README.md
82922c2 verified
metadata
language: mn
license: mit
tags:
  - mongolian
  - tokenizer
  - sentencepiece

SentencePiece Tokenizer

This repository contains a fine-tuned SentencePiece tokenizer on Mongolian text.

Files

  • tokenizer_config.json: The tokenizer configuration file
  • mn_tokenizer.model: The SentencePiece model file
  • mn_tokenizer.vocab: The SentencePiece vocabulary file

Usage

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Namuun123/mn_sentencepiece_tokenizer")