34d70a537ce078dde04eb63ef7a36429
This model is a fine-tuned version of google-bert/bert-large-cased-whole-word-masking on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:
- Loss: 0.6108
- Data Size: 1.0
- Epoch Runtime: 20.6449
- Mse: 0.6109
- Mae: 0.5973
- R2: 0.7267
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Mse | Mae | R2 |
|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 7.4515 | 0 | 1.6898 | 7.4527 | 2.3093 | -2.3338 |
| No log | 1 | 179 | 2.8020 | 0.0078 | 2.1389 | 2.8029 | 1.4234 | -0.2538 |
| No log | 2 | 358 | 2.5336 | 0.0156 | 2.3082 | 2.5343 | 1.2969 | -0.1337 |
| No log | 3 | 537 | 1.6077 | 0.0312 | 3.1354 | 1.6082 | 1.0274 | 0.2806 |
| No log | 4 | 716 | 1.2374 | 0.0625 | 4.0468 | 1.2378 | 0.9115 | 0.4463 |
| No log | 5 | 895 | 1.9482 | 0.125 | 5.7737 | 1.9486 | 1.1644 | 0.1283 |
| 0.097 | 6 | 1074 | 0.9205 | 0.25 | 8.4272 | 0.9209 | 0.7591 | 0.5881 |
| 0.7278 | 7 | 1253 | 0.6312 | 0.5 | 11.8686 | 0.6317 | 0.6472 | 0.7174 |
| 0.4931 | 8.0 | 1432 | 0.5005 | 1.0 | 20.6268 | 0.5005 | 0.5416 | 0.7761 |
| 0.3608 | 9.0 | 1611 | 0.6133 | 1.0 | 20.8483 | 0.6135 | 0.6208 | 0.7256 |
| 0.2698 | 10.0 | 1790 | 0.4684 | 1.0 | 19.7219 | 0.4687 | 0.5137 | 0.7903 |
| 0.2071 | 11.0 | 1969 | 0.5011 | 1.0 | 20.0212 | 0.5012 | 0.5274 | 0.7758 |
| 0.2098 | 12.0 | 2148 | 0.5021 | 1.0 | 19.8743 | 0.5025 | 0.5510 | 0.7752 |
| 0.1783 | 13.0 | 2327 | 0.4792 | 1.0 | 19.7704 | 0.4794 | 0.5308 | 0.7855 |
| 0.1448 | 14.0 | 2506 | 0.6108 | 1.0 | 20.6449 | 0.6109 | 0.5973 | 0.7267 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- -