Model save
Browse files- README.md +9 -54
- logs/events.out.tfevents.1763566184.tikgpu10.939660.4 +2 -2
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
|
|
| 16 |
|
| 17 |
This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
|
| 18 |
It achieves the following results on the evaluation set:
|
| 19 |
-
- Loss:
|
| 20 |
|
| 21 |
## Model description
|
| 22 |
|
|
@@ -35,70 +35,25 @@ More information needed
|
|
| 35 |
### Training hyperparameters
|
| 36 |
|
| 37 |
The following hyperparameters were used during training:
|
| 38 |
-
- learning_rate:
|
| 39 |
- train_batch_size: 8
|
| 40 |
- eval_batch_size: 16
|
| 41 |
- seed: 42
|
| 42 |
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 43 |
- lr_scheduler_type: linear
|
| 44 |
-
- lr_scheduler_warmup_steps:
|
| 45 |
-
- num_epochs:
|
| 46 |
- mixed_precision_training: Native AMP
|
| 47 |
|
| 48 |
### Training results
|
| 49 |
|
| 50 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 51 |
|:-------------:|:-----:|:-----:|:---------------:|
|
| 52 |
-
|
|
| 53 |
-
| 3.
|
| 54 |
-
| 3.
|
| 55 |
-
| 3.
|
| 56 |
-
| 3.
|
| 57 |
-
| 3.2996 | 6.0 | 1710 | 3.8942 |
|
| 58 |
-
| 3.5063 | 7.0 | 1995 | 3.8985 |
|
| 59 |
-
| 3.4073 | 8.0 | 2280 | 3.9164 |
|
| 60 |
-
| 3.2897 | 9.0 | 2565 | 3.9335 |
|
| 61 |
-
| 3.2355 | 10.0 | 2850 | 3.9501 |
|
| 62 |
-
| 3.2153 | 11.0 | 3135 | 3.9670 |
|
| 63 |
-
| 3.0633 | 12.0 | 3420 | 3.9940 |
|
| 64 |
-
| 3.0258 | 13.0 | 3705 | 4.0125 |
|
| 65 |
-
| 2.8951 | 14.0 | 3990 | 4.0485 |
|
| 66 |
-
| 2.9628 | 15.0 | 4275 | 4.0615 |
|
| 67 |
-
| 2.6961 | 16.0 | 4560 | 4.0907 |
|
| 68 |
-
| 2.8086 | 17.0 | 4845 | 4.1207 |
|
| 69 |
-
| 2.7014 | 18.0 | 5130 | 4.1463 |
|
| 70 |
-
| 2.6813 | 19.0 | 5415 | 4.1685 |
|
| 71 |
-
| 2.5686 | 20.0 | 5700 | 4.2127 |
|
| 72 |
-
| 2.4509 | 21.0 | 5985 | 4.2431 |
|
| 73 |
-
| 2.5327 | 22.0 | 6270 | 4.2569 |
|
| 74 |
-
| 2.4029 | 23.0 | 6555 | 4.3080 |
|
| 75 |
-
| 2.5409 | 24.0 | 6840 | 4.3201 |
|
| 76 |
-
| 2.4863 | 25.0 | 7125 | 4.3456 |
|
| 77 |
-
| 2.2923 | 26.0 | 7410 | 4.4077 |
|
| 78 |
-
| 2.3704 | 27.0 | 7695 | 4.4213 |
|
| 79 |
-
| 2.239 | 28.0 | 7980 | 4.4589 |
|
| 80 |
-
| 2.4065 | 29.0 | 8265 | 4.4888 |
|
| 81 |
-
| 2.1606 | 30.0 | 8550 | 4.5011 |
|
| 82 |
-
| 2.3792 | 31.0 | 8835 | 4.5244 |
|
| 83 |
-
| 2.0402 | 32.0 | 9120 | 4.5647 |
|
| 84 |
-
| 2.2368 | 33.0 | 9405 | 4.5788 |
|
| 85 |
-
| 2.1341 | 34.0 | 9690 | 4.6060 |
|
| 86 |
-
| 2.0746 | 35.0 | 9975 | 4.6244 |
|
| 87 |
-
| 2.1967 | 36.0 | 10260 | 4.6548 |
|
| 88 |
-
| 2.0212 | 37.0 | 10545 | 4.6723 |
|
| 89 |
-
| 2.0272 | 38.0 | 10830 | 4.6886 |
|
| 90 |
-
| 2.0901 | 39.0 | 11115 | 4.7127 |
|
| 91 |
-
| 2.1051 | 40.0 | 11400 | 4.7235 |
|
| 92 |
-
| 2.0967 | 41.0 | 11685 | 4.7322 |
|
| 93 |
-
| 1.9759 | 42.0 | 11970 | 4.7475 |
|
| 94 |
-
| 1.9597 | 43.0 | 12255 | 4.7659 |
|
| 95 |
-
| 1.9472 | 44.0 | 12540 | 4.7717 |
|
| 96 |
-
| 1.9566 | 45.0 | 12825 | 4.7852 |
|
| 97 |
-
| 2.1209 | 46.0 | 13110 | 4.7891 |
|
| 98 |
-
| 1.9769 | 47.0 | 13395 | 4.7927 |
|
| 99 |
-
| 1.8431 | 48.0 | 13680 | 4.7993 |
|
| 100 |
-
| 1.8459 | 49.0 | 13965 | 4.8010 |
|
| 101 |
-
| 2.0649 | 50.0 | 14250 | 4.8017 |
|
| 102 |
|
| 103 |
|
| 104 |
### Framework versions
|
|
|
|
| 16 |
|
| 17 |
This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
|
| 18 |
It achieves the following results on the evaluation set:
|
| 19 |
+
- Loss: 3.3080
|
| 20 |
|
| 21 |
## Model description
|
| 22 |
|
|
|
|
| 35 |
### Training hyperparameters
|
| 36 |
|
| 37 |
The following hyperparameters were used during training:
|
| 38 |
+
- learning_rate: 2e-05
|
| 39 |
- train_batch_size: 8
|
| 40 |
- eval_batch_size: 16
|
| 41 |
- seed: 42
|
| 42 |
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 43 |
- lr_scheduler_type: linear
|
| 44 |
+
- lr_scheduler_warmup_steps: 400
|
| 45 |
+
- num_epochs: 5
|
| 46 |
- mixed_precision_training: Native AMP
|
| 47 |
|
| 48 |
### Training results
|
| 49 |
|
| 50 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 51 |
|:-------------:|:-----:|:-----:|:---------------:|
|
| 52 |
+
| 3.3201 | 1.0 | 7590 | 3.4258 |
|
| 53 |
+
| 3.2002 | 2.0 | 15180 | 3.3526 |
|
| 54 |
+
| 3.1497 | 3.0 | 22770 | 3.3187 |
|
| 55 |
+
| 3.0062 | 4.0 | 30360 | 3.3028 |
|
| 56 |
+
| 3.0219 | 5.0 | 37950 | 3.3080 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
|
| 59 |
### Framework versions
|
logs/events.out.tfevents.1763566184.tikgpu10.939660.4
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c0e5cfdeb7b4fa4856ffd88ea080b6fdadb5412c31231f10c7a8addc5510b4a4
|
| 3 |
+
size 816405
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 497774208
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:19b98e4b0580bf736acb417bd509373be5a8b2d0bf2f26af29476ccf4d346055
|
| 3 |
size 497774208
|