dux-chapter-audio-dataset-force-aligned-speecht5
This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.5259
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 3407
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 4000
- training_steps: 40000
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.5083 | 20.4082 | 1000 | 0.4927 |
| 0.4805 | 40.8163 | 2000 | 0.4796 |
| 0.458 | 61.2245 | 3000 | 0.4803 |
| 0.4557 | 81.6327 | 4000 | 0.4823 |
| 0.4416 | 102.0408 | 5000 | 0.4887 |
| 0.4321 | 122.4490 | 6000 | 0.4811 |
| 0.4202 | 142.8571 | 7000 | 0.4814 |
| 0.4086 | 163.2653 | 8000 | 0.4822 |
| 0.4071 | 183.6735 | 9000 | 0.4870 |
| 0.3981 | 204.0816 | 10000 | 0.4906 |
| 0.391 | 224.4898 | 11000 | 0.4903 |
| 0.3927 | 244.8980 | 12000 | 0.4901 |
| 0.3812 | 265.3061 | 13000 | 0.4945 |
| 0.3807 | 285.7143 | 14000 | 0.4933 |
| 0.3706 | 306.1224 | 15000 | 0.4977 |
| 0.3752 | 326.5306 | 16000 | 0.4997 |
| 0.3699 | 346.9388 | 17000 | 0.5021 |
| 0.3755 | 367.3469 | 18000 | 0.5007 |
| 0.3627 | 387.7551 | 19000 | 0.5042 |
| 0.3678 | 408.1633 | 20000 | 0.5103 |
| 0.353 | 428.5714 | 21000 | 0.5080 |
| 0.357 | 448.9796 | 22000 | 0.5085 |
| 0.3546 | 469.3878 | 23000 | 0.5108 |
| 0.3517 | 489.7959 | 24000 | 0.5136 |
| 0.3495 | 510.2041 | 25000 | 0.5174 |
| 0.3481 | 530.6122 | 26000 | 0.5148 |
| 0.3397 | 551.0204 | 27000 | 0.5172 |
| 0.336 | 571.4286 | 28000 | 0.5160 |
| 0.3431 | 591.8367 | 29000 | 0.5198 |
| 0.342 | 612.2449 | 30000 | 0.5213 |
| 0.3516 | 632.6531 | 31000 | 0.5227 |
| 0.3347 | 653.0612 | 32000 | 0.5229 |
| 0.3322 | 673.4694 | 33000 | 0.5233 |
| 0.3372 | 693.8776 | 34000 | 0.5235 |
| 0.3311 | 714.2857 | 35000 | 0.5234 |
| 0.3273 | 734.6939 | 36000 | 0.5251 |
| 0.3343 | 755.1020 | 37000 | 0.5263 |
| 0.3301 | 775.5102 | 38000 | 0.5268 |
| 0.3327 | 795.9184 | 39000 | 0.5261 |
| 0.333 | 816.3265 | 40000 | 0.5259 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- 157
Model tree for sil-ai/dux-chapter-audio-dataset-force-aligned-speecht5
Base model
microsoft/speecht5_tts