usage help

#2
by erygvy - opened

Is there any Chance that i can usw this model with this? https://github.com/devnen/Chatterbox-TTS-Server
I guess its just a String exchange somewhere, but i think this would help get this spread very well

erygvy changed discussion status to closed
erygvy changed discussion status to open

That would be awesome. I nearly asked the same question just for this one https://github.com/travisvn/chatterbox-tts-api

I would be so awesome to use Kartoffelbox with OpenAI APi support. I would like to use it inside Home Assistant as my TTS pipline.

Bitte bitte um Hilfe :D

Hi, ich würde es gerne als Chatterbox Model nutzen, komme aber auch nicht weiter

For the API the models are saved in ./models
https://github.com/travisvn/chatterbox-tts-api/blob/dc025b94b6926ab582d680f8fc8840312a325b1e/app/config.py#L32

You could start it first and then replace the t3 model in the ./models folder with the Kartoffel one.

For the other project it is similar
https://github.com/devnen/Chatterbox-TTS-Server/blob/5bf7293862363df835bc8bf67c2ef38860befa63/server.py#L146

Run it once, let it download all, replace the t3 model with mine.

Author of the Chatterbox TTS Server here. Instead of changing the python script, you can try to install a clean copy into a separate folder. Do not run it yet. First change this settings in the config.yaml file in the root folder, from:

model:
repo_id: ResembleAI/chatterbox

to

model:
repo_id: SebastianBodza/Kartoffelbox-v0.1

Then when the app is started, it should download the German model. You should probably change the reference files to point to audio files with German speech.

Posted this on this page too:
https://github.com/devnen/Chatterbox-TTS-Server/issues/25

this didnt work for me with Docker. ( i deleted all caches and rebuilt the image ) but it would always download ResembleAI/chatterbox in hf_cache/hub (even though it logged that it tried the kartoffelbox hf repo from_pretrained - but i guess as there is only the t3 file and not the others there is some kind of fallback or maybe it was because of a missing Huggingfacce token, as the repo is gated. )
i now downloaded the file manually and replaced the blob in the hf_cache/hub directory with the Kartoffelbox t3 file
Now it seems to work

EDIT: Thank you for the Training! Sounds good so far

this didnt work for me with Docker. ( i deleted all caches and rebuilt the image ) but it would always download ResembleAI/chatterbox in hf_cache/hub (even though it logged that it tried the kartoffelbox hf repo from_pretrained - but i guess as there is only the t3 file and not the others there is some kind of fallback or maybe it was because of a missing Huggingfacce token, as the repo is gated. )
i now downloaded the file manually and replaced the blob in the hf_cache/hub directory with the Kartoffelbox t3 file
Now it seems to work

EDIT: Thank you for the Training! Sounds good so far

For everybody who need more details on how to make it work with chatterbox-tts-server:

This is some kind of Solution:
I went to the folder .cache\huggingface\hub\models--ResembleAI--chatterbox\snapshots\ and in the numbered folder, there is the t3_cfg.safetensors symlink-file. I renamed that to BACKUPt3_cfg.safetensors. Put the t3_kartoffelbox.safetensors into that folder and renamed that to t3_cfg.safetensors and it seems to work now. For everybody who doesn't know where to find the .cache folder, it's on C:\Users<your username>

But it's strange why the other method is not working. Somehow it has to be possible to just direct the chatterbox-tts-server program to those files on my harddrive. I just don't know how.

Just a heads up. I uploaded a new version with expressions. And I also included this time all necessary files from the original chatterbox. Usage with the different libraries should be easier.

Just a heads up. I uploaded a new version with expressions. And I also included this time all necessary files from the original chatterbox. Usage with the different libraries should be easier.

Nice! Will try that later today. How do you control expressions? Is it like in the fishaudio model with brackets? Like (laughing). Or what expressions is it capable of and how do we control them?

Just a heads up. I uploaded a new version with expressions. And I also included this time all necessary files from the original chatterbox. Usage with the different libraries should be easier.

For me the new files dont work, do i need to update chatterbox-tts-server to a new version / rebuilt the image for a newer version of chatterbox-tts? it seems to just output jibberish when im using the file from the repo here.

Anyone got them to work yet?

@DarkNecrotic the tags are in the readme

@meganoob1337 are you using condos.pt / precalculated reference speaker or a reference mp3?

I also finetuned the encoder and t3cond. So old .pt files will not work. However the cloning is alot better.

I am using a Reference mp3 in the chatterbox-tts-server
on my first try i replaced all the files from chatterbox with your versions in the hf_cache (where the models are loaded from) so also the conds.pt, although i dont know if chatterbox-tts-server uses that ? maybe that is the problem :D

@DarkNecrotic the tags are in the readme

Ups... thanks! Should have read that first. Thanks for your work and effort!

I am using a Reference mp3 in the chatterbox-tts-server
on my first try i replaced all the files from chatterbox with your versions in the hf_cache (where the models are loaded from) so also the conds.pt, although i dont know if chatterbox-tts-server uses that ? maybe that is the problem :D

For me it works when I put all files from this repo into the C:\Users<username>.cache\huggingface\hub\models--ResembleAI--chatterbox\snapshots(folder with weird name).
I renamed all files in there to BACKUP in case i need them later and put all 5 files in there. (conds.pt; s2gen.safetensors; t3_cfg.safetensors; tokenizer.json; ve.safetensors)

One Thing I found out is, that the sentence you type should not be too short. Try it with longer sentences or two sentences and it should. If you write "Hi, my name is James." it puts out jibberish for me too. But if you write one or two sentences after that, all works.
(maybe that has to do with the reference-audio somehow?) I'm also using a ref-audio file.

I am using a Reference mp3 in the chatterbox-tts-server
on my first try i replaced all the files from chatterbox with your versions in the hf_cache (where the models are loaded from) so also the conds.pt, although i dont know if chatterbox-tts-server uses that ? maybe that is the problem :D

For me it works when I put all files from this repo into the C:\Users<username>.cache\huggingface\hub\models--ResembleAI--chatterbox\snapshots(folder with weird name).
I renamed all files in there to BACKUP in case i need them later and put all 5 files in there. (conds.pt; s2gen.safetensors; t3_cfg.safetensors; tokenizer.json; ve.safetensors)

One Thing I found out is, that the sentence you type should not be too short. Try it with longer sentences or two sentences and it should. If you write "Hi, my name is James." it puts out jibberish for me too. But if you write one or two sentences after that, all works.
(maybe that has to do with the reference-audio somehow?) I'm also using a ref-audio file.

Yeah I had that working for the old version, but in the new one i had some problems. it works now after i deleted the model_cache/.cache folder (there were some lock files in there ) after that it worked, and i played around a bit with the reference audio & the different seeds configs, and now have it working decently for short sentences aswell ( my homeassistant voice says "daran kann ich mich nicht errinern" when it doesnt recognize a command in olaf scholz`s voice xD )

Sign up or log in to comment