Swedish

A Swedish Text-To-Speech model fine-tuned from F5-TTS using approximately 200 hours of speech from the Common Voice dataset and parliamentary recordings from the RixVox dataset. Training was conducted locally using an RTX 4080.

Dataset preparation scripts can be found at https://github.com/ChiliOlavi/F5-TTS/tree/swedish-tts

Training Configuration

                    - --exp_name
                    - F5TTS_v1_Base
                    - --learning_rate
                    - "0.0001"
                    - --batch_size_per_gpu
                    - "2000"
                    - --batch_size_type
                    - frame
                    - --max_samples
                    - "96"
                    - --grad_accumulation_steps
                    - "16"
                    - --max_grad_norm
                    - "0.3"
                    - --epochs
                    - "100"
                    - --num_warmup_updates
                    - "3000"
                    - --save_per_updates
                    - "10000"
                    - --keep_last_n_checkpoints
                    - "-1"
                    - --last_per_updates
                    - "5000"
                    - --tokenizer
                    - pinyin

Inference Parameters

{dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4}

Thanks

Special thanks to Amos Wallgren for quality assurance.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EkhoCollective/f5-tts-swedish

Base model

SWivid/F5-TTS
Finetuned
(69)
this model

Datasets used to train EkhoCollective/f5-tts-swedish