xlm-roberta-large-bs-16-lr-1e-05-ep-1-wp-0.1-gacc-8-gnm-1.0-FP16-mx-512-v0.1

This model is a fine-tuned version of FacebookAI/xlm-roberta-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7260

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
18.1163 0.0055 50 5.3796
17.8993 0.0109 100 5.0878
16.9877 0.0164 150 4.9054
16.6951 0.0219 200 4.7595
17.1956 0.0273 250 nan
15.8726 0.0328 300 4.3051
15.8035 0.0382 350 4.1517
15.7724 0.0437 400 4.1179
15.2753 0.0492 450 4.0127
15.3521 0.0546 500 3.9190
14.8835 0.0601 550 3.9637
14.0689 0.0656 600 3.8433
14.8118 0.0710 650 3.6712
14.2018 0.0765 700 3.6710
13.9958 0.0819 750 3.6602
13.7916 0.0874 800 3.5779
13.5979 0.0929 850 3.5067
13.4006 0.0983 900 3.6269
13.5289 0.1038 950 3.4551
13.8164 0.1093 1000 3.4879
13.4817 0.1147 1050 3.3742
13.3695 0.1202 1100 3.4005
13.5545 0.1256 1150 nan
13.302 0.1311 1200 3.4059
12.5605 0.1366 1250 3.2717
13.0459 0.1420 1300 3.2960
12.2962 0.1475 1350 3.3724
12.8443 0.1530 1400 3.4170
12.8099 0.1584 1450 3.2809
12.7225 0.1639 1500 3.2522
12.3512 0.1694 1550 3.2800
12.1792 0.1748 1600 3.1975
12.1118 0.1803 1650 3.3046
12.6203 0.1857 1700 3.1530
12.1681 0.1912 1750 3.2485
12.1655 0.1967 1800 3.1499
12.2293 0.2021 1850 nan
13.0211 0.2076 1900 3.1991
12.4299 0.2131 1950 3.0704
12.6538 0.2185 2000 3.1030
12.6587 0.2240 2050 3.0889
12.2565 0.2294 2100 3.1421
12.5572 0.2349 2150 3.1872
12.2115 0.2404 2200 3.0676
12.0915 0.2458 2250 3.1293
12.219 0.2513 2300 3.0100
11.6999 0.2568 2350 3.0791
11.6684 0.2622 2400 2.9914
12.3173 0.2677 2450 3.0561
11.422 0.2731 2500 3.0956
11.5678 0.2786 2550 3.0362
12.0914 0.2841 2600 3.1488
11.7267 0.2895 2650 3.0109
11.8665 0.2950 2700 3.0516
11.8245 0.3005 2750 nan
11.5481 0.3059 2800 3.0401
11.9585 0.3114 2850 3.0800
11.539 0.3169 2900 3.0259
12.1617 0.3223 2950 2.9229
11.5909 0.3278 3000 nan
10.9824 0.3332 3050 2.7986
11.6864 0.3387 3100 2.9532
11.8222 0.3442 3150 2.9858
11.6286 0.3496 3200 2.9151
11.8334 0.3551 3250 2.9736
11.2803 0.3606 3300 2.8687
11.6026 0.3660 3350 2.8898
11.1204 0.3715 3400 2.8825
11.423 0.3769 3450 2.9872
11.4879 0.3824 3500 nan
11.573 0.3879 3550 2.8579
12.4166 0.3933 3600 2.9434
11.2353 0.3988 3650 2.8829
11.4 0.4043 3700 2.9354
10.9686 0.4097 3750 2.9237
11.4042 0.4152 3800 3.0190
11.8554 0.4207 3850 2.9271
11.9632 0.4261 3900 3.0376
11.1689 0.4316 3950 2.8588
11.4857 0.4370 4000 3.0363
11.4006 0.4425 4050 nan
10.7614 0.4480 4100 2.9261
11.5512 0.4534 4150 2.8923
11.5332 0.4589 4200 2.9156
11.0784 0.4644 4250 2.8987
10.7075 0.4698 4300 2.8653
11.6934 0.4753 4350 2.9077
11.4598 0.4807 4400 2.8124
11.1979 0.4862 4450 2.9331
10.9605 0.4917 4500 2.8848
11.1071 0.4971 4550 2.8974
11.2478 0.5026 4600 2.8152
11.0701 0.5081 4650 nan
11.6265 0.5135 4700 2.7508
11.4536 0.5190 4750 2.8445
11.135 0.5244 4800 nan
10.6803 0.5299 4850 2.8144
11.5174 0.5354 4900 nan
10.9389 0.5408 4950 2.8543
10.7885 0.5463 5000 2.8492
11.4897 0.5518 5050 2.8976
11.0512 0.5572 5100 2.8881
11.0947 0.5627 5150 2.8912
10.8325 0.5682 5200 2.8968
11.0708 0.5736 5250 2.8654
10.7451 0.5791 5300 2.8796
11.1151 0.5845 5350 2.8472
11.1104 0.5900 5400 2.7230
10.8407 0.5955 5450 2.8088
11.3176 0.6009 5500 2.8639
10.667 0.6064 5550 2.8375
11.2007 0.6119 5600 2.7334
10.8507 0.6173 5650 2.9190
10.587 0.6228 5700 2.7530
10.9799 0.6282 5750 2.8033
10.9086 0.6337 5800 2.8213
11.3728 0.6392 5850 2.9052
10.8764 0.6446 5900 2.7960
10.801 0.6501 5950 2.7806
10.9585 0.6556 6000 2.8511
10.3226 0.6610 6050 2.8546
11.29 0.6665 6100 2.7878
11.0151 0.6719 6150 2.7628
10.9404 0.6774 6200 2.7391
10.6352 0.6829 6250 2.8001
10.8923 0.6883 6300 2.7758
11.1026 0.6938 6350 2.7681
10.9816 0.6993 6400 2.8254
10.8846 0.7047 6450 2.7311
11.5027 0.7102 6500 2.8385
10.9822 0.7157 6550 2.9160
10.2748 0.7211 6600 2.7859
10.8561 0.7266 6650 2.7536
11.0482 0.7320 6700 2.7821
10.7278 0.7375 6750 2.7903
11.1712 0.7430 6800 2.8584
11.2571 0.7484 6850 2.8279
11.1056 0.7539 6900 2.7671
10.6717 0.7594 6950 nan
11.6541 0.7648 7000 2.7546
10.9426 0.7703 7050 nan
10.9949 0.7757 7100 2.8047
10.5327 0.7812 7150 2.7311
10.6881 0.7867 7200 2.6669
10.6447 0.7921 7250 2.7921
10.6472 0.7976 7300 2.7847
10.8787 0.8031 7350 2.8591
10.2926 0.8085 7400 nan
10.2332 0.8140 7450 2.7988
11.0911 0.8194 7500 2.7928
10.3863 0.8249 7550 2.7478
11.0595 0.8304 7600 2.7977
11.0389 0.8358 7650 2.8220
10.4349 0.8413 7700 2.7518
10.9154 0.8468 7750 2.8029
11.232 0.8522 7800 2.7387
10.7908 0.8577 7850 2.6633
10.629 0.8632 7900 2.7734
10.7351 0.8686 7950 2.7511
10.7233 0.8741 8000 2.7507
10.5188 0.8795 8050 2.8178
10.3795 0.8850 8100 2.7701
10.3011 0.8905 8150 2.7541
10.9437 0.8959 8200 2.7943
10.4525 0.9014 8250 2.7283
10.832 0.9069 8300 2.6824
10.4682 0.9123 8350 2.7700
10.8953 0.9178 8400 2.7677
10.7738 0.9232 8450 2.8578
10.9378 0.9287 8500 2.7341
11.4643 0.9342 8550 2.7165
10.8033 0.9396 8600 2.7774
10.4552 0.9451 8650 2.6508
10.6864 0.9506 8700 2.8617
11.5173 0.9560 8750 2.6617
10.8722 0.9615 8800 2.6334
10.4516 0.9669 8850 2.8097
10.9324 0.9724 8900 2.7791
10.3205 0.9779 8950 2.6963
10.6569 0.9833 9000 2.7603
10.8572 0.9888 9050 nan
11.0403 0.9943 9100 2.7983
11.1694 0.9997 9150 2.7260

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
560M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for BounharAbdelaziz/xlm-roberta-large-bs-16-lr-1e-05-ep-1-wp-0.1-gacc-8-gnm-1.0-FP16-mx-512-v0.1

Finetuned
(714)
this model