ViDolphin-v1

This model is a fine-tuned version of ByteDance/Dolphin on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.01
num_epochs: 10
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.2154	0.4365	500	0.1682
0.1639	0.8730	1000	0.1235
0.1091	1.3090	1500	0.1027
0.0988	1.7455	2000	0.0896
0.0835	2.1816	2500	0.0831
0.0748	2.6181	3000	0.0785
0.0735	3.0541	3500	0.0754
0.0634	3.4906	4000	0.0729
0.0563	3.9271	4500	0.0708
0.0659	4.3632	5000	0.0696
0.0539	4.7997	5500	0.0680
0.0554	5.2357	6000	0.0676
0.055	5.6722	6500	0.0660
0.057	6.1082	7000	0.0660
0.0447	6.5447	7500	0.0658
0.0456	6.9812	8000	0.0647
0.042	7.4173	8500	0.0646
0.0482	7.8538	9000	0.0646
0.0386	8.2898	9500	0.0643
0.046	8.7263	10000	0.0639
0.0436	9.1624	10500	0.0642
0.0428	9.5989	11000	0.0640

Safetensors

Model size

0.4B params

Tensor type

I64

F32

Base model

Finetuned

(2)

this model