ea5503f5eb0f44f08690df71a7b5d2e3

This model is a fine-tuned version of google-t5/t5-base on the Helsinki-NLP/opus_books [de-ru] dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9953
  • Data Size: 1.0
  • Epoch Runtime: 106.0105
  • Bleu: 10.2895

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 2.7597 0 8.4820 0.4253
No log 1 434 2.6077 0.0078 10.2618 0.4821
No log 2 868 2.3576 0.0156 10.6719 0.5806
No log 3 1302 2.2020 0.0312 11.8069 0.7310
No log 4 1736 2.0808 0.0625 16.5864 1.0534
0.0918 5 2170 1.9764 0.125 21.5360 0.9362
2.078 6 2604 1.8608 0.25 32.7242 2.0158
1.92 7 3038 1.7406 0.5 52.7636 2.8017
1.7925 8.0 3472 1.5954 1.0 100.6842 3.5782
1.6637 9.0 3906 1.4976 1.0 99.4240 4.2416
1.5808 10.0 4340 1.4214 1.0 105.0762 4.9118
1.5098 11.0 4774 1.3634 1.0 102.8156 5.4506
1.4356 12.0 5208 1.3186 1.0 104.7841 5.8554
1.3905 13.0 5642 1.2750 1.0 102.3238 6.1622
1.3419 14.0 6076 1.2443 1.0 104.0530 6.5960
1.3127 15.0 6510 1.2162 1.0 103.2813 6.7289
1.2715 16.0 6944 1.1914 1.0 101.6225 7.0543
1.2421 17.0 7378 1.1681 1.0 101.3635 7.2670
1.2025 18.0 7812 1.1462 1.0 104.1814 7.4852
1.1593 19.0 8246 1.1349 1.0 102.9349 7.7047
1.1375 20.0 8680 1.1171 1.0 103.0859 7.9432
1.1153 21.0 9114 1.1020 1.0 104.5360 8.0993
1.081 22.0 9548 1.0891 1.0 106.6906 8.2592
1.0659 23.0 9982 1.0768 1.0 103.5095 8.4539
1.0454 24.0 10416 1.0716 1.0 104.2793 8.5494
1.032 25.0 10850 1.0601 1.0 106.2393 8.7721
0.9979 26.0 11284 1.0589 1.0 107.3540 8.8809
0.9874 27.0 11718 1.0484 1.0 107.2710 8.9978
0.9461 28.0 12152 1.0379 1.0 113.0742 9.1393
0.9435 29.0 12586 1.0340 1.0 114.6773 9.2591
0.9268 30.0 13020 1.0300 1.0 103.2993 9.2599
0.8959 31.0 13454 1.0233 1.0 103.4103 9.3891
0.8819 32.0 13888 1.0211 1.0 107.7374 9.4677
0.8915 33.0 14322 1.0094 1.0 104.0952 9.5236
0.8674 34.0 14756 1.0122 1.0 103.2641 9.6102
0.8526 35.0 15190 1.0142 1.0 102.7155 9.6117
0.8341 36.0 15624 1.0082 1.0 108.7346 9.7729
0.8221 37.0 16058 1.0088 1.0 104.8310 9.8156
0.8143 38.0 16492 1.0000 1.0 104.3322 9.7812
0.786 39.0 16926 1.0022 1.0 109.3286 9.9159
0.7788 40.0 17360 0.9976 1.0 101.4251 9.9975
0.7692 41.0 17794 0.9967 1.0 100.4028 9.9916
0.7575 42.0 18228 0.9919 1.0 105.7741 10.0278
0.7472 43.0 18662 0.9993 1.0 101.8167 10.0009
0.7334 44.0 19096 0.9982 1.0 102.0833 10.0496
0.7249 45.0 19530 1.0036 1.0 107.9295 10.2319
0.698 46.0 19964 0.9953 1.0 106.0105 10.2895

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/ea5503f5eb0f44f08690df71a7b5d2e3

Base model

google-t5/t5-base
Finetuned
(712)
this model