gen-text-poems

This model is a fine-tuned version of distilgpt2 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
No log	1.0	1	10.1858
No log	2.0	2	9.8156
No log	3.0	3	9.5954
No log	4.0	4	9.4465
No log	5.0	5	9.3399
No log	6.0	6	9.2648
No log	7.0	7	9.2078
No log	8.0	8	9.1611
No log	9.0	9	9.1152
9.3954	10.0	10	9.0680
9.3954	11.0	11	9.0206
9.3954	12.0	12	8.9763
9.3954	13.0	13	8.9398
9.3954	14.0	14	8.9157
9.3954	15.0	15	8.8867
9.3954	16.0	16	8.8490
9.3954	17.0	17	8.8148
9.3954	18.0	18	8.7874
9.3954	19.0	19	8.7735
8.3204	20.0	20	8.7601
8.3204	21.0	21	8.7281
8.3204	22.0	22	8.6956
8.3204	23.0	23	8.6778
8.3204	24.0	24	8.6708
8.3204	25.0	25	8.6569
8.3204	26.0	26	8.6378
8.3204	27.0	27	8.6196
8.3204	28.0	28	8.6066
8.3204	29.0	29	8.6014
7.5838	30.0	30	8.6017
7.5838	31.0	31	8.6036
7.5838	32.0	32	8.5972
7.5838	33.0	33	8.5879
7.5838	34.0	34	8.5830
7.5838	35.0	35	8.5835
7.5838	36.0	36	8.5827
7.5838	37.0	37	8.5812
7.5838	38.0	38	8.5794
7.5838	39.0	39	8.5804
7.1554	40.0	40	8.5795
7.1554	41.0	41	8.5774
7.1554	42.0	42	8.5772
7.1554	43.0	43	8.5767
7.1554	44.0	44	8.5766
7.1554	45.0	45	8.5772
7.1554	46.0	46	8.5781
7.1554	47.0	47	8.5785

Safetensors

Model size

81.9M params

Tensor type

F32

Base model

Finetuned

(935)

this model