dux-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.5083	20.4082	1000	0.4927
0.4805	40.8163	2000	0.4796
0.458	61.2245	3000	0.4803
0.4557	81.6327	4000	0.4823
0.4416	102.0408	5000	0.4887
0.4321	122.4490	6000	0.4811
0.4202	142.8571	7000	0.4814
0.4086	163.2653	8000	0.4822
0.4071	183.6735	9000	0.4870
0.3981	204.0816	10000	0.4906
0.391	224.4898	11000	0.4903
0.3927	244.8980	12000	0.4901
0.3812	265.3061	13000	0.4945
0.3807	285.7143	14000	0.4933
0.3706	306.1224	15000	0.4977
0.3752	326.5306	16000	0.4997
0.3699	346.9388	17000	0.5021
0.3755	367.3469	18000	0.5007
0.3627	387.7551	19000	0.5042
0.3678	408.1633	20000	0.5103
0.353	428.5714	21000	0.5080
0.357	448.9796	22000	0.5085
0.3546	469.3878	23000	0.5108
0.3517	489.7959	24000	0.5136
0.3495	510.2041	25000	0.5174
0.3481	530.6122	26000	0.5148
0.3397	551.0204	27000	0.5172
0.336	571.4286	28000	0.5160
0.3431	591.8367	29000	0.5198
0.342	612.2449	30000	0.5213
0.3516	632.6531	31000	0.5227
0.3347	653.0612	32000	0.5229
0.3322	673.4694	33000	0.5233
0.3372	693.8776	34000	0.5235
0.3311	714.2857	35000	0.5234
0.3273	734.6939	36000	0.5251
0.3343	755.1020	37000	0.5263
0.3301	775.5102	38000	0.5268
0.3327	795.9184	39000	0.5261
0.333	816.3265	40000	0.5259

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(1269)

this model