dux-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5259

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5083 20.4082 1000 0.4927
0.4805 40.8163 2000 0.4796
0.458 61.2245 3000 0.4803
0.4557 81.6327 4000 0.4823
0.4416 102.0408 5000 0.4887
0.4321 122.4490 6000 0.4811
0.4202 142.8571 7000 0.4814
0.4086 163.2653 8000 0.4822
0.4071 183.6735 9000 0.4870
0.3981 204.0816 10000 0.4906
0.391 224.4898 11000 0.4903
0.3927 244.8980 12000 0.4901
0.3812 265.3061 13000 0.4945
0.3807 285.7143 14000 0.4933
0.3706 306.1224 15000 0.4977
0.3752 326.5306 16000 0.4997
0.3699 346.9388 17000 0.5021
0.3755 367.3469 18000 0.5007
0.3627 387.7551 19000 0.5042
0.3678 408.1633 20000 0.5103
0.353 428.5714 21000 0.5080
0.357 448.9796 22000 0.5085
0.3546 469.3878 23000 0.5108
0.3517 489.7959 24000 0.5136
0.3495 510.2041 25000 0.5174
0.3481 530.6122 26000 0.5148
0.3397 551.0204 27000 0.5172
0.336 571.4286 28000 0.5160
0.3431 591.8367 29000 0.5198
0.342 612.2449 30000 0.5213
0.3516 632.6531 31000 0.5227
0.3347 653.0612 32000 0.5229
0.3322 673.4694 33000 0.5233
0.3372 693.8776 34000 0.5235
0.3311 714.2857 35000 0.5234
0.3273 734.6939 36000 0.5251
0.3343 755.1020 37000 0.5263
0.3301 775.5102 38000 0.5268
0.3327 795.9184 39000 0.5261
0.333 816.3265 40000 0.5259

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
157
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sil-ai/dux-chapter-audio-dataset-force-aligned-speecht5

Finetuned
(1269)
this model