metadata
library_name: transformers
tags:
- generated_from_trainer
model-index:
- name: impossible-llms-english-random-trigram
results: []
impossible-llms-english-random-trigram
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.3113
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 12
- eval_batch_size: 8
- seed: 0
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 384
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 3000
- mixed_precision_training: Native AMP
- label_smoothing_factor: 0.1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
14.1482 | 1.0 | 96 | 6.9646 |
11.4328 | 2.0 | 192 | 5.6992 |
11.1488 | 3.0 | 288 | 5.5112 |
10.5646 | 4.0 | 384 | 5.2485 |
10.2163 | 5.0 | 480 | 5.0376 |
9.8751 | 6.0 | 576 | 4.8854 |
9.6552 | 7.0 | 672 | 4.7683 |
9.4312 | 8.0 | 768 | 4.6836 |
9.301 | 9.0 | 864 | 4.6148 |
9.2448 | 10.0 | 960 | 4.5597 |
9.1271 | 11.0 | 1056 | 4.5156 |
9.0854 | 12.0 | 1152 | 4.4794 |
8.9255 | 13.0 | 1248 | 4.4493 |
8.8784 | 14.0 | 1344 | 4.4255 |
8.7833 | 15.0 | 1440 | 4.4035 |
8.6755 | 16.0 | 1536 | 4.3862 |
8.6895 | 17.0 | 1632 | 4.3722 |
8.6269 | 18.0 | 1728 | 4.3582 |
8.5067 | 19.0 | 1824 | 4.3492 |
8.4444 | 20.0 | 1920 | 4.3404 |
8.5608 | 21.0 | 2016 | 4.3332 |
8.4592 | 22.0 | 2112 | 4.3274 |
8.4261 | 23.0 | 2208 | 4.3233 |
8.471 | 24.0 | 2304 | 4.3193 |
8.3813 | 25.0 | 2400 | 4.3163 |
8.3404 | 26.0 | 2496 | 4.3149 |
8.3891 | 27.0 | 2592 | 4.3132 |
8.3628 | 28.0 | 2688 | 4.3122 |
8.4306 | 29.0 | 2784 | 4.3117 |
8.2589 | 30.0 | 2880 | 4.3113 |
8.247 | 31.0 | 2976 | 4.3113 |
33.3577 | 31.2520 | 3000 | 4.3113 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.4.0+cu121
- Datasets 3.4.0
- Tokenizers 0.21.0