gpt2-mydataset

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
8.0182	0.0617	50	7.4035
7.2474	0.1233	100	7.0215
6.8954	0.1850	150	6.7160
6.6165	0.2466	200	6.5297
6.4592	0.3083	250	6.3550
6.2465	0.3699	300	6.2113
6.1436	0.4316	350	6.0884
6.0491	0.4932	400	5.9975
5.975	0.5549	450	5.9086
5.8884	0.6165	500	5.8378
5.8221	0.6782	550	5.7791
5.8462	0.7398	600	5.7297
5.7512	0.8015	650	5.6896
5.6931	0.8631	700	5.6288
5.6135	0.9248	750	5.5984
5.5411	0.9864	800	5.5516
5.4014	1.0481	850	5.5243
5.3431	1.1097	900	5.4868
5.3665	1.1714	950	5.4619
5.3427	1.2330	1000	5.4313
5.2786	1.2947	1050	5.4047
5.3004	1.3564	1100	5.3722
5.279	1.4180	1150	5.3468
5.2892	1.4797	1200	5.3211
5.225	1.5413	1250	5.2964
5.243	1.6030	1300	5.2768
5.1481	1.6646	1350	5.2502
5.1373	1.7263	1400	5.2257
5.1689	1.7879	1450	5.2159
5.1515	1.8496	1500	5.1912
5.115	1.9112	1550	5.1717
5.1288	1.9729	1600	5.1469
4.911	2.0345	1650	5.1360
4.881	2.0962	1700	5.1215
4.8682	2.1578	1750	5.1092
4.9181	2.2195	1800	5.0962
4.904	2.2811	1850	5.0810
4.9309	2.3428	1900	5.0686
4.8559	2.4044	1950	5.0563
4.8654	2.4661	2000	5.0444
4.8656	2.5277	2050	5.0383
4.8428	2.5894	2100	5.0228
4.8463	2.6510	2150	5.0125
4.7709	2.7127	2200	5.0048
4.8147	2.7744	2250	4.9981
4.7904	2.8360	2300	4.9923
4.7581	2.8977	2350	4.9869
4.8169	2.9593	2400	4.9846

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(2020)

this model