NLLB for Low-Resource Middle Eastern Languages
This model is a fine-tuned version of nllb-200-distilled-600M. It achieves the following results on the evaluation set:
- Loss: 5.2654
- Bleu: 14.7793
- Gen Len: 14.8572
This model is fine-tuned to translate from the following languages into English (eng_Latn):
- Luri Bakhtiari (
bqi_Arab) - Gilaki (
glk_Arab) - Hawrami (
hac_Arab) - Laki (
lki_Arab) - Mazanderani (
mzn_Arab) - Southern Kurdish (
sdh_Arab) - Talysh (
tly_Arab) - Zazaki (
zza_Latn)
Intended uses & limitations
This model is trained to translate into English. It is not to train from English.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 50.0
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
|---|---|---|---|---|---|
| 0.0023 | 1.0 | 396 | 5.0074 | 14.2158 | 14.4698 |
| 0.0024 | 2.0 | 792 | 5.0344 | 14.2734 | 14.3467 |
| 0.0027 | 3.0 | 1188 | 5.0521 | 14.2059 | 14.3137 |
| 0.0039 | 4.0 | 1584 | 5.0034 | 13.7306 | 14.7393 |
| 0.0069 | 5.0 | 1980 | 5.0078 | 13.8802 | 14.4926 |
| 0.013 | 6.0 | 2376 | 4.9957 | 13.5899 | 14.494 |
| 0.0165 | 7.0 | 2772 | 4.9971 | 13.324 | 14.9148 |
| 0.0195 | 8.0 | 3168 | 4.9949 | 13.5516 | 14.4363 |
| 0.0218 | 9.0 | 3564 | 4.9608 | 13.6364 | 14.1306 |
| 0.0249 | 10.0 | 3960 | 4.9907 | 13.1309 | 14.3164 |
| 0.0237 | 11.0 | 4356 | 4.9949 | 13.389 | 14.4307 |
| 0.0183 | 12.0 | 4752 | 5.0267 | 13.4564 | 14.6526 |
| 0.0212 | 13.0 | 5148 | 5.0724 | 13.59 | 14.2952 |
| 0.0158 | 14.0 | 5544 | 5.0832 | 13.3564 | 14.5018 |
| 0.0149 | 15.0 | 5940 | 5.0480 | 13.71 | 14.4261 |
| 0.0152 | 16.0 | 6336 | 5.0454 | 13.3368 | 14.4033 |
| 0.0179 | 17.0 | 6732 | 5.0282 | 13.2518 | 14.4889 |
| 0.0139 | 18.0 | 7128 | 5.0397 | 13.4478 | 14.5729 |
| 0.0124 | 19.0 | 7524 | 5.1244 | 13.418 | 14.4207 |
| 0.0107 | 20.0 | 7920 | 5.1304 | 13.4141 | 14.5943 |
| 0.0104 | 21.0 | 8316 | 5.0841 | 13.6054 | 14.0954 |
| 0.0121 | 22.0 | 8712 | 5.0961 | 13.4688 | 14.6354 |
| 0.0086 | 23.0 | 9108 | 5.1330 | 13.5374 | 14.4979 |
| 0.0097 | 24.0 | 9504 | 5.1155 | 13.4956 | 14.4816 |
| 0.0074 | 25.0 | 9900 | 5.1742 | 13.8177 | 14.3275 |
| 0.0058 | 26.0 | 10296 | 5.1479 | 13.6641 | 14.219 |
| 0.0058 | 27.0 | 10692 | 5.1932 | 13.7447 | 14.1751 |
| 0.0044 | 28.0 | 11088 | 5.1611 | 13.488 | 14.7169 |
| 0.0083 | 29.0 | 11484 | 5.1577 | 13.8153 | 14.3556 |
| 0.0053 | 30.0 | 11880 | 5.2061 | 14.1224 | 14.1012 |
| 0.0046 | 31.0 | 12276 | 5.2480 | 13.9126 | 14.5045 |
| 0.0054 | 32.0 | 12672 | 5.1965 | 14.019 | 14.16 |
| 0.0035 | 33.0 | 13068 | 5.1847 | 14.004 | 14.4037 |
| 0.0032 | 34.0 | 13464 | 5.2124 | 14.228 | 14.2273 |
| 0.0024 | 35.0 | 13860 | 5.2090 | 14.2703 | 14.0995 |
| 0.0029 | 36.0 | 14256 | 5.2327 | 13.7593 | 14.604 |
| 0.0043 | 37.0 | 14652 | 5.2005 | 14.3019 | 14.0886 |
| 0.0022 | 38.0 | 15048 | 5.2218 | 14.2565 | 14.1928 |
| 0.0031 | 39.0 | 15444 | 5.2403 | 14.1208 | 14.438 |
| 0.0022 | 40.0 | 15840 | 5.2507 | 14.2927 | 14.3079 |
| 0.0014 | 41.0 | 16236 | 5.2558 | 14.2727 | 14.2874 |
| 0.0021 | 42.0 | 16632 | 5.2735 | 14.1117 | 14.1115 |
| 0.0013 | 43.0 | 17028 | 5.2707 | 14.4166 | 14.1923 |
| 0.0021 | 44.0 | 17424 | 5.2790 | 14.4223 | 14.2129 |
| 0.0016 | 45.0 | 17820 | 5.2758 | 14.486 | 14.2625 |
| 0.0019 | 46.0 | 18216 | 5.2546 | 14.5501 | 14.2695 |
| 0.0011 | 47.0 | 18612 | 5.2654 | 14.6166 | 14.1882 |
| 0.0016 | 48.0 | 19008 | 5.2610 | 14.5838 | 14.2617 |
| 0.0011 | 49.0 | 19404 | 5.2642 | 14.5987 | 14.2119 |
| 0.001 | 49.8743 | 19750 | 5.2645 | 14.576 | 14.2289 |
Framework versions
- Transformers 4.48.0.dev0
- Pytorch 2.4.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for SinaAhmadi/NLLB-DOLMA
Base model
facebook/nllb-200-distilled-600M