FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System
👉🏻 FireRedTTS-1S Paper 👈🏻
👉🏻 FireRedTTS-1S Demos 👈🏻
News
- [2025/05/26] 🔥 We add flow-mathing decoder and update the technical report
- [2025/03/25] 🔥 We release the technical report and project page
Roadmap
- 2025/04
- Release the pre-trained checkpoints and inference code.
Usage
Clone and install
- Clone the repo
https://github.com/FireRedTeam/FireRedTTS.git
cd FireRedTTS
- Create conda env
# step1.create env
conda create --name redtts python=3.10
# stpe2.install torch (pytorch should match the cuda-version on your machine)
# CUDA 11.8
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia
# CUDA 12.1
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
# step3.install fireredtts form source
cd fireredtts
pip install -e .
# step4.install other requirements
pip install -r requirements.txt
Download models
Download the required model files from Model_Lists and place them in the folder pretrained_models
Basic Usage
import os
import torchaudio
from fireredtts.fireredtts import FireRedTTS
# acoustic llm decoder
tts = FireRedTTS(
config_path="configs/config_24k.json",
pretrained_path=<pretrained_models_dir>,
)
"""
# flow matching decoder
tts = FireRedTTS(
config_path="configs/config_24k_flow.json",
pretrained_path=<pretrained_models_dir>,
)
"""
#same language
# For the test-hard evaluation, we enabled the use_tn=True configuration setting.
rec_wavs = tts.synthesize(
prompt_wav="examples/prompt_1.wav",
prompt_text="对,所以说你现在的话,这个账单的话,你既然说能处理,那你就想办法处理掉。",
text="小红书,是中国大陆的网络购物和社交平台,成立于二零一三年六月。",
lang="zh",
use_tn=True
)
rec_wavs = rec_wavs.detach().cpu()
out_wav_path = os.path.join("./example.wav")
torchaudio.save(out_wav_path, rec_wavs, 24000)
Tips
- The reference audio should not be too long or too short; a duration of 3 to 10 seconds is recommended.
- The reference audio should be smooth and natural, and the accompanying text must be accurate to enhance the stability and naturalness of the synthesized audio.
⚠️ Usage Disclaimer ❗️❗️❗️❗️❗️❗️
- The project incorporates zero-shot voice cloning functionality; Please note that this capability is intended solely for academic research purposes.
- DO NOT use this model for ANY illegal activities❗️❗️❗️❗️❗️❗️
- The developers assume no liability for any misuse of this model.
- If you identify any instances of abuse, misuse, or fraudulent activities related to this project, please report them to our team immediately.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support