Upload 12 files

Browse files

Files changed (12) hide show

LICENSE +373 -0
README.md +83 -0
demo.py +121 -0
server/Dockerfile +21 -0
server/Dockerfile.cpu +20 -0
server/Dockerfile.cuda121 +23 -0
server/main.py +185 -0
server/requirements.txt +12 -0
server/requirements_cpu.txt +11 -0
test/default_speaker.json +0 -0
test/requirements.txt +2 -0
test/test_streaming.py +127 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,373 @@

+Mozilla Public License Version 2.0
+==================================
+1. Definitions
+--------------
+1.1. "Contributor"
+    means each individual or legal entity that creates, contributes to
+    the creation of, or owns Covered Software.
+1.2. "Contributor Version"
+    means the combination of the Contributions of others (if any) used
+    by a Contributor and that particular Contributor's Contribution.
+1.3. "Contribution"
+    means Covered Software of a particular Contributor.
+1.4. "Covered Software"
+    means Source Code Form to which the initial Contributor has attached
+    the notice in Exhibit A, the Executable Form of such Source Code
+    Form, and Modifications of such Source Code Form, in each case
+    including portions thereof.
+1.5. "Incompatible With Secondary Licenses"
+    means
+    (a) that the initial Contributor has attached the notice described
+        in Exhibit B to the Covered Software; or
+    (b) that the Covered Software was made available under the terms of
+        version 1.1 or earlier of the License, but not also under the
+        terms of a Secondary License.
+1.6. "Executable Form"
+    means any form of the work other than Source Code Form.
+1.7. "Larger Work"
+    means a work that combines Covered Software with other material, in
+    a separate file or files, that is not Covered Software.
+1.8. "License"
+    means this document.
+1.9. "Licensable"
+    means having the right to grant, to the maximum extent possible,
+    whether at the time of the initial grant or subsequently, any and
+    all of the rights conveyed by this License.
+1.10. "Modifications"
+    means any of the following:
+    (a) any file in Source Code Form that results from an addition to,
+        deletion from, or modification of the contents of Covered
+        Software; or
+    (b) any new file in Source Code Form that contains any Covered
+        Software.
+1.11. "Patent Claims" of a Contributor
+    means any patent claim(s), including without limitation, method,
+    process, and apparatus claims, in any patent Licensable by such
+    Contributor that would be infringed, but for the grant of the
+    License, by the making, using, selling, offering for sale, having
+    made, import, or transfer of either its Contributions or its
+    Contributor Version.
+1.12. "Secondary License"
+    means either the GNU General Public License, Version 2.0, the GNU
+    Lesser General Public License, Version 2.1, the GNU Affero General
+    Public License, Version 3.0, or any later versions of those
+    licenses.
+1.13. "Source Code Form"
+    means the form of the work preferred for making modifications.
+1.14. "You" (or "Your")
+    means an individual or a legal entity exercising rights under this
+    License. For legal entities, "You" includes any entity that
+    controls, is controlled by, or is under common control with You. For
+    purposes of this definition, "control" means (a) the power, direct
+    or indirect, to cause the direction or management of such entity,
+    whether by contract or otherwise, or (b) ownership of more than
+    fifty percent (50%) of the outstanding shares or beneficial
+    ownership of such entity.
+2. License Grants and Conditions
+--------------------------------
+2.1. Grants
+Each Contributor hereby grants You a world-wide, royalty-free,
+non-exclusive license:
+(a) under intellectual property rights (other than patent or trademark)
+    Licensable by such Contributor to use, reproduce, make available,
+    modify, display, perform, distribute, and otherwise exploit its
+    Contributions, either on an unmodified basis, with Modifications, or
+    as part of a Larger Work; and
+(b) under Patent Claims of such Contributor to make, use, sell, offer
+    for sale, have made, import, and otherwise transfer either its
+    Contributions or its Contributor Version.
+2.2. Effective Date
+The licenses granted in Section 2.1 with respect to any Contribution
+become effective for each Contribution on the date the Contributor first
+distributes such Contribution.
+2.3. Limitations on Grant Scope
+The licenses granted in this Section 2 are the only rights granted under
+this License. No additional rights or licenses will be implied from the
+distribution or licensing of Covered Software under this License.
+Notwithstanding Section 2.1(b) above, no patent license is granted by a
+Contributor:
+(a) for any code that a Contributor has removed from Covered Software;
+    or
+(b) for infringements caused by: (i) Your and any other third party's
+    modifications of Covered Software, or (ii) the combination of its
+    Contributions with other software (except as part of its Contributor
+    Version); or
+(c) under Patent Claims infringed by Covered Software in the absence of
+    its Contributions.
+This License does not grant any rights in the trademarks, service marks,
+or logos of any Contributor (except as may be necessary to comply with
+the notice requirements in Section 3.4).
+2.4. Subsequent Licenses
+No Contributor makes additional grants as a result of Your choice to
+distribute the Covered Software under a subsequent version of this
+License (see Section 10.2) or under the terms of a Secondary License (if
+permitted under the terms of Section 3.3).
+2.5. Representation
+Each Contributor represents that the Contributor believes its
+Contributions are its original creation(s) or it has sufficient rights
+to grant the rights to its Contributions conveyed by this License.
+2.6. Fair Use
+This License is not intended to limit any rights You have under
+applicable copyright doctrines of fair use, fair dealing, or other
+equivalents.
+2.7. Conditions
+Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted
+in Section 2.1.
+3. Responsibilities
+-------------------
+3.1. Distribution of Source Form
+All distribution of Covered Software in Source Code Form, including any
+Modifications that You create or to which You contribute, must be under
+the terms of this License. You must inform recipients that the Source
+Code Form of the Covered Software is governed by the terms of this
+License, and how they can obtain a copy of this License. You may not
+attempt to alter or restrict the recipients' rights in the Source Code
+Form.
+3.2. Distribution of Executable Form
+If You distribute Covered Software in Executable Form then:
+(a) such Covered Software must also be made available in Source Code
+    Form, as described in Section 3.1, and You must inform recipients of
+    the Executable Form how they can obtain a copy of such Source Code
+    Form by reasonable means in a timely manner, at a charge no more
+    than the cost of distribution to the recipient; and
+(b) You may distribute such Executable Form under the terms of this
+    License, or sublicense it under different terms, provided that the
+    license for the Executable Form does not attempt to limit or alter
+    the recipients' rights in the Source Code Form under this License.
+3.3. Distribution of a Larger Work
+You may create and distribute a Larger Work under terms of Your choice,
+provided that You also comply with the requirements of this License for
+the Covered Software. If the Larger Work is a combination of Covered
+Software with a work governed by one or more Secondary Licenses, and the
+Covered Software is not Incompatible With Secondary Licenses, this
+License permits You to additionally distribute such Covered Software
+under the terms of such Secondary License(s), so that the recipient of
+the Larger Work may, at their option, further distribute the Covered
+Software under the terms of either this License or such Secondary
+License(s).
+3.4. Notices
+You may not remove or alter the substance of any license notices
+(including copyright notices, patent notices, disclaimers of warranty,
+or limitations of liability) contained within the Source Code Form of
+the Covered Software, except that You may alter any license notices to
+the extent required to remedy known factual inaccuracies.
+3.5. Application of Additional Terms
+You may choose to offer, and to charge a fee for, warranty, support,
+indemnity or liability obligations to one or more recipients of Covered
+Software. However, You may do so only on Your own behalf, and not on
+behalf of any Contributor. You must make it absolutely clear that any
+such warranty, support, indemnity, or liability obligation is offered by
+You alone, and You hereby agree to indemnify every Contributor for any
+liability incurred by such Contributor as a result of warranty, support,
+indemnity or liability terms You offer. You may include additional
+disclaimers of warranty and limitations of liability specific to any
+jurisdiction.
+4. Inability to Comply Due to Statute or Regulation
+---------------------------------------------------
+If it is impossible for You to comply with any of the terms of this
+License with respect to some or all of the Covered Software due to
+statute, judicial order, or regulation then You must: (a) comply with
+the terms of this License to the maximum extent possible; and (b)
+describe the limitations and the code they affect. Such description must
+be placed in a text file included with all distributions of the Covered
+Software under this License. Except to the extent prohibited by statute
+or regulation, such description must be sufficiently detailed for a
+recipient of ordinary skill to be able to understand it.
+5. Termination
+--------------
+5.1. The rights granted under this License will terminate automatically
+if You fail to comply with any of its terms. However, if You become
+compliant, then the rights granted under this License from a particular
+Contributor are reinstated (a) provisionally, unless and until such
+Contributor explicitly and finally terminates Your grants, and (b) on an
+ongoing basis, if such Contributor fails to notify You of the
+non-compliance by some reasonable means prior to 60 days after You have
+come back into compliance. Moreover, Your grants from a particular
+Contributor are reinstated on an ongoing basis if such Contributor
+notifies You of the non-compliance by some reasonable means, this is the
+first time You have received notice of non-compliance with this License
+from such Contributor, and You become compliant prior to 30 days after
+Your receipt of the notice.
+5.2. If You initiate litigation against any entity by asserting a patent
+infringement claim (excluding declaratory judgment actions,
+counter-claims, and cross-claims) alleging that a Contributor Version
+directly or indirectly infringes any patent, then the rights granted to
+You by any and all Contributors for the Covered Software under Section
+2.1 of this License shall terminate.
+5.3. In the event of termination under Sections 5.1 or 5.2 above, all
+end user license agreements (excluding distributors and resellers) which
+have been validly granted by You or Your distributors under this License
+prior to termination shall survive termination.
+************************************************************************
+*                                                                      *
+*  6. Disclaimer of Warranty                                           *
+*  -------------------------                                           *
+*                                                                      *
+*  Covered Software is provided under this License on an "as is"       *
+*  basis, without warranty of any kind, either expressed, implied, or  *
+*  statutory, including, without limitation, warranties that the       *
+*  Covered Software is free of defects, merchantable, fit for a        *
+*  particular purpose or non-infringing. The entire risk as to the     *
+*  quality and performance of the Covered Software is with You.        *
+*  Should any Covered Software prove defective in any respect, You     *
+*  (not any Contributor) assume the cost of any necessary servicing,   *
+*  repair, or correction. This disclaimer of warranty constitutes an   *
+*  essential part of this License. No use of any Covered Software is   *
+*  authorized under this License except under this disclaimer.         *
+*                                                                      *
+************************************************************************
+************************************************************************
+*                                                                      *
+*  7. Limitation of Liability                                          *
+*  --------------------------                                          *
+*                                                                      *
+*  Under no circumstances and under no legal theory, whether tort      *
+*  (including negligence), contract, or otherwise, shall any           *
+*  Contributor, or anyone who distributes Covered Software as          *
+*  permitted above, be liable to You for any direct, indirect,         *
+*  special, incidental, or consequential damages of any character      *
+*  including, without limitation, damages for lost profits, loss of    *
+*  goodwill, work stoppage, computer failure or malfunction, or any    *
+*  and all other commercial damages or losses, even if such party      *
+*  shall have been informed of the possibility of such damages. This   *
+*  limitation of liability shall not apply to liability for death or   *
+*  personal injury resulting from such party's negligence to the       *
+*  extent applicable law prohibits such limitation. Some               *
+*  jurisdictions do not allow the exclusion or limitation of           *
+*  incidental or consequential damages, so this exclusion and          *
+*  limitation may not apply to You.                                    *
+*                                                                      *
+************************************************************************
+8. Litigation
+-------------
+Any litigation relating to this License may be brought only in the
+courts of a jurisdiction where the defendant maintains its principal
+place of business and such litigation shall be governed by laws of that
+jurisdiction, without reference to its conflict-of-law provisions.
+Nothing in this Section shall prevent a party's ability to bring
+cross-claims or counter-claims.
+9. Miscellaneous
+----------------
+This License represents the complete agreement concerning the subject
+matter hereof. If any provision of this License is held to be
+unenforceable, such provision shall be reformed only to the extent
+necessary to make it enforceable. Any law or regulation which provides
+that the language of a contract shall be construed against the drafter
+shall not be used to construe this License against a Contributor.
+10. Versions of the License
+---------------------------
+10.1. New Versions
+Mozilla Foundation is the license steward. Except as provided in Section
+10.3, no one other than the license steward has the right to modify or
+publish new versions of this License. Each version will be given a
+distinguishing version number.
+10.2. Effect of New Versions
+You may distribute the Covered Software under the terms of the version
+of the License under which You originally received the Covered Software,
+or under the terms of any subsequent version published by the license
+steward.
+10.3. Modified Versions
+If you create software not governed by this License, and you want to
+create a new license for such software, you may create and use a
+modified version of this License if you rename the license and remove
+any references to the name of the license steward (except to note that
+such modified license differs from this License).
+10.4. Distributing Source Code Form that is Incompatible With Secondary
+Licenses
+If You choose to distribute Source Code Form that is Incompatible With
+Secondary Licenses under the terms of this version of the License, the
+notice described in Exhibit B of this License must be attached.
+Exhibit A - Source Code Form License Notice
+-------------------------------------------
+  This Source Code Form is subject to the terms of the Mozilla Public
+  License, v. 2.0. If a copy of the MPL was not distributed with this
+  file, You can obtain one at http://mozilla.org/MPL/2.0/.
+If it is not possible or desirable to put the notice in a particular
+file, then You may include the notice in a location (such as a LICENSE
+file in a relevant directory) where a recipient would be likely to look
+for such a notice.
+You may add additional accurate notices of copyright ownership.
+Exhibit B - "Incompatible With Secondary Licenses" Notice
+---------------------------------------------------------
+  This Source Code Form is "Incompatible With Secondary Licenses", as
+  defined by the Mozilla Public License, v. 2.0.

README.md ADDED Viewed

	@@ -0,0 +1,83 @@

+# XTTS streaming server
+*Warning: XTTS-streaming-server doesn't support concurrent streaming requests, it's a demo server, not meant for production.*
+https://github.com/coqui-ai/xtts-streaming-server/assets/17219561/7220442a-e88a-4288-8a73-608c4b39d06c
+## 1) Run the server
+### Use a pre-built image
+CUDA 12.1:
+```bash
+$ docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121
+```
+CUDA 11.8 (for older cards):
+```bash
+$ docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest
+```
+CPU (not recommended):
+```bash
+$ docker run -e COQUI_TOS_AGREED=1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest-cpu
+```
+Run with a fine-tuned model:
+Make sure the model folder `/path/to/model/folder`  contains the following files:
+- `config.json`
+- `model.pth`
+- `vocab.json`
+```bash
+$ docker run -v /path/to/model/folder:/app/tts_models --gpus=all -e COQUI_TOS_AGREED=1  --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest`
+```
+Setting the `COQUI_TOS_AGREED` environment variable to `1` indicates you have read and agreed to
+the terms of the [CPML license](https://coqui.ai/cpml). (Fine-tuned XTTS models also are under the [CPML license](https://coqui.ai/cpml))
+### Build the image yourself
+To build the Docker container Pytorch 2.1 and CUDA 11.8 :
+`DOCKERFILE` may be `Dockerfile`, `Dockerfile.cpu`, `Dockerfile.cuda121`, or your own custom Dockerfile.
+```bash
+$ git clone [email protected]:coqui-ai/xtts-streaming-server.git
+$ cd xtts-streaming-server/server
+$ docker build -t xtts-stream . -f DOCKERFILE
+$ docker run --gpus all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 xtts-stream
+```
+Setting the `COQUI_TOS_AGREED` environment variable to `1` indicates you have read and agreed to
+the terms of the [CPML license](https://coqui.ai/cpml). (Fine-tuned XTTS models also are under the [CPML license](https://coqui.ai/cpml))
+## 2) Testing the running server
+Once your Docker container is running, you can test that it's working properly. You will need to run the following code from a fresh terminal.
+### Clone `xtts-streaming-server` if you haven't already
+```bash
+$ git clone [email protected]:coqui-ai/xtts-streaming-server.git
+```
+### Using the gradio demo
+```bash
+$ cd xtts-streaming-server
+$ python -m pip install -r test/requirements.txt
+$ python demo.py
+```
+### Using the test script
+```bash
+$ cd xtts-streaming-server/test
+$ python -m pip install -r requirements.txt
+$ python test_streaming.py
+```

demo.py ADDED Viewed

	@@ -0,0 +1,121 @@

+import gradio as gr
+import requests
+import base64
+import tempfile
+import json
+import os
+SERVER_URL = 'http://localhost:8000'
+OUTPUT = "./demo_outputs"
+cloned_speakers = {}
+print("Preparing file structure...")
+if not os.path.exists(OUTPUT):
+    os.mkdir(OUTPUT)
+    os.mkdir(os.path.join(OUTPUT, "cloned_speakers"))
+    os.mkdir(os.path.join(OUTPUT, "generated_audios"))
+elif os.path.exists(os.path.join(OUTPUT, "cloned_speakers")):
+    print("Loading existing cloned speakers...")
+    for file in os.listdir(os.path.join(OUTPUT, "cloned_speakers")):
+        if file.endswith(".json"):
+            with open(os.path.join(OUTPUT, "cloned_speakers", file), "r") as fp:
+                cloned_speakers[file[:-5]] = json.load(fp)
+    print("Available cloned speakers:", ", ".join(cloned_speakers.keys()))
+try:
+    print("Getting metadata from server ...")
+    LANUGAGES = requests.get(SERVER_URL + "/languages").json()
+    print("Available languages:", ", ".join(LANUGAGES))
+    STUDIO_SPEAKERS = requests.get(SERVER_URL + "/studio_speakers").json()
+    print("Available studio speakers:", ", ".join(STUDIO_SPEAKERS.keys()))
+except:
+    raise Exception("Please make sure the server is running first.")
+def clone_speaker(upload_file, clone_speaker_name, cloned_speaker_names):
+    files = {"wav_file": ("reference.wav", open(upload_file, "rb"))}
+    embeddings = requests.post(SERVER_URL + "/clone_speaker", files=files).json()
+    with open(os.path.join(OUTPUT, "cloned_speakers", clone_speaker_name + ".json"), "w") as fp:
+        json.dump(embeddings, fp)
+    cloned_speakers[clone_speaker_name] = embeddings
+    cloned_speaker_names.append(clone_speaker_name)
+    return upload_file, clone_speaker_name, cloned_speaker_names, gr.Dropdown.update(choices=cloned_speaker_names)
+def tts(text, speaker_type, speaker_name_studio, speaker_name_custom, lang):
+    embeddings = STUDIO_SPEAKERS[speaker_name_studio] if speaker_type == 'Studio' else cloned_speakers[speaker_name_custom]
+    generated_audio = requests.post(
+        SERVER_URL + "/tts",
+        json={
+            "text": text,
+            "language": lang,
+            "speaker_embedding": embeddings["speaker_embedding"],
+            "gpt_cond_latent": embeddings["gpt_cond_latent"]
+        }
+    ).content
+    generated_audio_path = os.path.join("demo_outputs", "generated_audios", next(tempfile._get_candidate_names()) + ".wav")
+    with open(generated_audio_path, "wb") as fp:
+        fp.write(base64.b64decode(generated_audio))
+        return fp.name
+with gr.Blocks() as demo:
+    cloned_speaker_names = gr.State(list(cloned_speakers.keys()))
+    with gr.Tab("TTS"):
+        with gr.Column() as row4:
+            with gr.Row() as col4:
+                speaker_name_studio = gr.Dropdown(
+                    label="Studio speaker",
+                    choices=STUDIO_SPEAKERS.keys(),
+                    value="Asya Anara" if "Asya Anara" in STUDIO_SPEAKERS.keys() else None,
+                )
+                speaker_name_custom = gr.Dropdown(
+                    label="Cloned speaker",
+                    choices=cloned_speaker_names.value,
+                    value=cloned_speaker_names.value[0] if len(cloned_speaker_names.value) != 0 else None,
+                )
+            speaker_type = gr.Dropdown(label="Speaker type", choices=["Studio", "Cloned"], value="Studio")
+        with gr.Column() as col2:
+            lang = gr.Dropdown(label="Language", choices=LANUGAGES, value="en")
+            text = gr.Textbox(label="text", value="A quick brown fox jumps over the lazy dog.")
+            tts_button = gr.Button(value="TTS")
+        with gr.Column() as col3:
+            generated_audio = gr.Audio(label="Generated audio", autoplay=True)
+    with gr.Tab("Clone a new speaker"):
+        with gr.Column() as col1:
+            upload_file = gr.Audio(label="Upload reference audio", type="filepath")
+            clone_speaker_name = gr.Textbox(label="Speaker name", value="default_speaker")
+            clone_button = gr.Button(value="Clone speaker")
+    clone_button.click(
+        fn=clone_speaker,
+        inputs=[upload_file, clone_speaker_name, cloned_speaker_names],
+        outputs=[upload_file, clone_speaker_name, cloned_speaker_names, speaker_name_custom],
+    )
+    tts_button.click(
+        fn=tts,
+        inputs=[text, speaker_type, speaker_name_studio, speaker_name_custom, lang],
+        outputs=[generated_audio],
+    )
+if __name__ == "__main__":
+    print("Warming up server...")
+    with open("test/default_speaker.json", "r") as fp:
+        warmup_speaker = json.load(fp)
+    resp = requests.post(
+        SERVER_URL + "/tts",
+        json={
+            "text": "This is a warmup request.",
+            "language": "en",
+            "speaker_embedding": warmup_speaker["speaker_embedding"],
+            "gpt_cond_latent": warmup_speaker["gpt_cond_latent"],
+        }
+    )
+    resp.raise_for_status()
+    print("Starting the demo...")
+    demo.launch(
+        share=False,
+        debug=False,
+        server_port=3009,
+        server_name="0.0.0.0",
+    )

server/Dockerfile ADDED Viewed

	@@ -0,0 +1,21 @@

+FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel
+ARG DEBIAN_FRONTEND=noninteractive
+RUN apt-get update && \
+    apt-get install --no-install-recommends -y sox libsox-fmt-all curl wget gcc git git-lfs build-essential libaio-dev libsndfile1 ssh ffmpeg && \
+    apt-get clean && apt-get -y autoremove
+WORKDIR /app
+COPY requirements.txt .
+RUN python -m pip install --use-deprecated=legacy-resolver -r requirements.txt \
+    && python -m pip cache purge
+RUN python -m unidic download
+RUN mkdir -p /app/tts_models
+COPY main.py .
+ENV NVIDIA_DISABLE_REQUIRE=1
+ENV NUM_THREADS=2
+EXPOSE 80
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]

server/Dockerfile.cpu ADDED Viewed

	@@ -0,0 +1,20 @@

+FROM python:3.11.7
+ARG DEBIAN_FRONTEND=noninteractive
+RUN apt-get update && \
+    apt-get install --no-install-recommends -y sox libsox-fmt-all curl wget gcc git git-lfs build-essential libaio-dev libsndfile1 ssh ffmpeg && \
+    apt-get clean && apt-get -y autoremove
+WORKDIR /app
+COPY requirements_cpu.txt .
+RUN python -m pip install --use-deprecated=legacy-resolver -r requirements_cpu.txt \
+    && python -m pip cache purge
+RUN python -m unidic download
+RUN mkdir -p /app/tts_models
+COPY main.py .
+ENV USE_CPU=1
+EXPOSE 80
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]

server/Dockerfile.cuda121 ADDED Viewed

	@@ -0,0 +1,23 @@

+FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel
+ARG DEBIAN_FRONTEND=noninteractive
+RUN apt-get update && \
+    apt-get install --no-install-recommends -y sox libsox-fmt-all curl wget gcc git git-lfs build-essential libaio-dev libsndfile1 ssh ffmpeg && \
+    apt-get clean && apt-get -y autoremove
+WORKDIR /app
+COPY requirements.txt .
+RUN python -m pip install --use-deprecated=legacy-resolver -r requirements.txt \
+    && python -m pip cache purge
+RUN python -m unidic download
+RUN mkdir -p /app/tts_models
+COPY main.py .
+#Mark this 1 if you have older card
+ENV NVIDIA_DISABLE_REQUIRE=0
+ENV NUM_THREADS=2
+EXPOSE 80
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]

server/main.py ADDED Viewed

	@@ -0,0 +1,185 @@

+import base64
+import io
+import os
+import tempfile
+import wave
+import torch
+import numpy as np
+from typing import List
+from pydantic import BaseModel
+from fastapi import FastAPI, UploadFile, Body
+from fastapi.responses import StreamingResponse
+from TTS.tts.configs.xtts_config import XttsConfig
+from TTS.tts.models.xtts import Xtts
+from TTS.utils.generic_utils import get_user_data_dir
+from TTS.utils.manage import ModelManager
+torch.set_num_threads(int(os.environ.get("NUM_THREADS", os.cpu_count())))
+device = torch.device("cuda" if os.environ.get("USE_CPU", "0") == "0" else "cpu")
+if not torch.cuda.is_available() and device == "cuda":
+    raise RuntimeError("CUDA device unavailable, please use Dockerfile.cpu instead.")
+custom_model_path = os.environ.get("CUSTOM_MODEL_PATH", "/app/tts_models")
+if os.path.exists(custom_model_path) and os.path.isfile(custom_model_path + "/config.json"):
+    model_path = custom_model_path
+    print("Loading custom model from", model_path, flush=True)
+else:
+    print("Loading default model", flush=True)
+    model_name = "tts_models/multilingual/multi-dataset/xtts_v2"
+    print("Downloading XTTS Model:", model_name, flush=True)
+    ModelManager().download_model(model_name)
+    model_path = os.path.join(get_user_data_dir("tts"), model_name.replace("/", "--"))
+    print("XTTS Model downloaded", flush=True)
+print("Loading XTTS", flush=True)
+config = XttsConfig()
+config.load_json(os.path.join(model_path, "config.json"))
+model = Xtts.init_from_config(config)
+model.load_checkpoint(config, checkpoint_dir=model_path, eval=True, use_deepspeed=True if device == "cuda" else False)
+model.to(device)
+print("XTTS Loaded.", flush=True)
+print("Running XTTS Server ...", flush=True)
+##### Run fastapi #####
+app = FastAPI(
+    title="XTTS Streaming server",
+    description="""XTTS Streaming server""",
+    version="0.0.1",
+    docs_url="/",
+)
+@app.post("/clone_speaker")
+def predict_speaker(wav_file: UploadFile):
+    """Compute conditioning inputs from reference audio file."""
+    temp_audio_name = next(tempfile._get_candidate_names())
+    with open(temp_audio_name, "wb") as temp, torch.inference_mode():
+        temp.write(io.BytesIO(wav_file.file.read()).getbuffer())
+        gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(
+            temp_audio_name
+        )
+    return {
+        "gpt_cond_latent": gpt_cond_latent.cpu().squeeze().half().tolist(),
+        "speaker_embedding": speaker_embedding.cpu().squeeze().half().tolist(),
+    }
+def postprocess(wav):
+    """Post process the output waveform"""
+    if isinstance(wav, list):
+        wav = torch.cat(wav, dim=0)
+    wav = wav.clone().detach().cpu().numpy()
+    wav = wav[None, : int(wav.shape[0])]
+    wav = np.clip(wav, -1, 1)
+    wav = (wav * 32767).astype(np.int16)
+    return wav
+def encode_audio_common(
+    frame_input, encode_base64=True, sample_rate=24000, sample_width=2, channels=1
+):
+    """Return base64 encoded audio"""
+    wav_buf = io.BytesIO()
+    with wave.open(wav_buf, "wb") as vfout:
+        vfout.setnchannels(channels)
+        vfout.setsampwidth(sample_width)
+        vfout.setframerate(sample_rate)
+        vfout.writeframes(frame_input)
+    wav_buf.seek(0)
+    if encode_base64:
+        b64_encoded = base64.b64encode(wav_buf.getbuffer()).decode("utf-8")
+        return b64_encoded
+    else:
+        return wav_buf.read()
+class StreamingInputs(BaseModel):
+    speaker_embedding: List[float]
+    gpt_cond_latent: List[List[float]]
+    text: str
+    language: str
+    add_wav_header: bool = True
+    stream_chunk_size: str = "20"
+def predict_streaming_generator(parsed_input: dict = Body(...)):
+    speaker_embedding = torch.tensor(parsed_input.speaker_embedding).unsqueeze(0).unsqueeze(-1)
+    gpt_cond_latent = torch.tensor(parsed_input.gpt_cond_latent).reshape((-1, 1024)).unsqueeze(0)
+    text = parsed_input.text
+    language = parsed_input.language
+    stream_chunk_size = int(parsed_input.stream_chunk_size)
+    add_wav_header = parsed_input.add_wav_header
+    chunks = model.inference_stream(
+        text,
+        language,
+        gpt_cond_latent,
+        speaker_embedding,
+        stream_chunk_size=stream_chunk_size,
+        enable_text_splitting=True
+    )
+    for i, chunk in enumerate(chunks):
+        chunk = postprocess(chunk)
+        if i == 0 and add_wav_header:
+            yield encode_audio_common(b"", encode_base64=False)
+            yield chunk.tobytes()
+        else:
+            yield chunk.tobytes()
+@app.post("/tts_stream")
+def predict_streaming_endpoint(parsed_input: StreamingInputs):
+    return StreamingResponse(
+        predict_streaming_generator(parsed_input),
+        media_type="audio/wav",
+    )
+class TTSInputs(BaseModel):
+    speaker_embedding: List[float]
+    gpt_cond_latent: List[List[float]]
+    text: str
+    language: str
+@app.post("/tts")
+def predict_speech(parsed_input: TTSInputs):
+    speaker_embedding = torch.tensor(parsed_input.speaker_embedding).unsqueeze(0).unsqueeze(-1)
+    gpt_cond_latent = torch.tensor(parsed_input.gpt_cond_latent).reshape((-1, 1024)).unsqueeze(0)
+    text = parsed_input.text
+    language = parsed_input.language
+    out = model.inference(
+        text,
+        language,
+        gpt_cond_latent,
+        speaker_embedding,
+    )
+    wav = postprocess(torch.tensor(out["wav"]))
+    return encode_audio_common(wav.tobytes())
+@app.get("/studio_speakers")
+def get_speakers():
+    if hasattr(model, "speaker_manager") and hasattr(model.speaker_manager, "speakers"):
+        return {
+            speaker: {
+                "speaker_embedding": model.speaker_manager.speakers[speaker]["speaker_embedding"].cpu().squeeze().half().tolist(),
+                "gpt_cond_latent": model.speaker_manager.speakers[speaker]["gpt_cond_latent"].cpu().squeeze().half().tolist(),
+            }
+            for speaker in model.speaker_manager.speakers.keys()
+        }
+    else:
+        return {}
+@app.get("/languages")
+def get_languages():
+    return config.languages

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+TTS @ git+https://github.com/coqui-ai/TTS@fa28f99f1508b5b5366539b2149963edcb80ba62
+uvicorn[standard]==0.23.2
+fastapi==0.95.2
+deepspeed==0.10.3
+pydantic==1.10.13
+python-multipart==0.0.6
+typing-extensions>=4.8.0
+numpy==1.24.3
+cutlet
+mecab-python3==1.0.6
+unidic-lite==1.0.8
+unidic==1.1.0

server/requirements_cpu.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+TTS @ git+https://github.com/coqui-ai/TTS@fa28f99f1508b5b5366539b2149963edcb80ba62
+uvicorn[standard]==0.23.2
+fastapi==0.95.2
+pydantic==1.10.13
+python-multipart==0.0.6
+typing-extensions>=4.8.0
+numpy==1.24.3
+cutlet
+mecab-python3==1.0.6
+unidic-lite==1.0.8
+unidic==1.1.0

test/default_speaker.json ADDED Viewed

The diff for this file is too large to render. See raw diff

test/requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ requests==2.31.0
2	+ gradio==3.50.2

test/test_streaming.py ADDED Viewed

	@@ -0,0 +1,127 @@

+import argparse
+import json
+import shutil
+import subprocess
+import sys
+import time
+from typing import Iterator
+import requests
+def is_installed(lib_name: str) -> bool:
+    lib = shutil.which(lib_name)
+    if lib is None:
+        return False
+    return True
+def save(audio: bytes, filename: str) -> None:
+    with open(filename, "wb") as f:
+        f.write(audio)
+def stream_ffplay(audio_stream, output_file, save=True):
+    if not save:
+        ffplay_cmd = ["ffplay", "-nodisp", "-probesize", "1024", "-autoexit", "-"]
+    else:
+        print("Saving to ", output_file)
+        ffplay_cmd = ["ffmpeg", "-probesize", "1024", "-i", "-", output_file]
+    ffplay_proc = subprocess.Popen(ffplay_cmd, stdin=subprocess.PIPE)
+    for chunk in audio_stream:
+        if chunk is not None:
+            ffplay_proc.stdin.write(chunk)
+    # close on finish
+    ffplay_proc.stdin.close()
+    ffplay_proc.wait()
+def tts(text, speaker, language, server_url, stream_chunk_size) -> Iterator[bytes]:
+    start = time.perf_counter()
+    speaker["text"] = text
+    speaker["language"] = language
+    speaker["stream_chunk_size"] = stream_chunk_size  # you can reduce it to get faster response, but degrade quality
+    res = requests.post(
+        f"{server_url}/tts_stream",
+        json=speaker,
+        stream=True,
+    )
+    end = time.perf_counter()
+    print(f"Time to make POST: {end-start}s", file=sys.stderr)
+    if res.status_code != 200:
+        print("Error:", res.text)
+        sys.exit(1)
+    first = True
+    for chunk in res.iter_content(chunk_size=512):
+        if first:
+            end = time.perf_counter()
+            print(f"Time to first chunk: {end-start}s", file=sys.stderr)
+            first = False
+        if chunk:
+            yield chunk
+    print("⏱️ response.elapsed:", res.elapsed)
+def get_speaker(ref_audio,server_url):
+    files = {"wav_file": ("reference.wav", open(ref_audio, "rb"))}
+    response = requests.post(f"{server_url}/clone_speaker", files=files)
+    return response.json()
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--text",
+        default="It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
+        help="text input for TTS"
+    )
+    parser.add_argument(
+        "--language",
+        default="en",
+        help="Language to use default is 'en'  (English)"
+    )
+    parser.add_argument(
+        "--output_file",
+        default=None,
+        help="Save TTS output to given filename"
+    )
+    parser.add_argument(
+        "--ref_file",
+        default=None,
+        help="Reference audio file to use, when not given will use default"
+    )
+    parser.add_argument(
+        "--server_url",
+        default="http://localhost:8000",
+        help="Server url http://localhost:8000 default, change to your server location "
+    )
+    parser.add_argument(
+        "--stream_chunk_size",
+        default="20",
+        help="Stream chunk size , 20 default, reducing will get faster latency but may degrade quality"
+    )
+    args = parser.parse_args()
+    with open("./default_speaker.json", "r") as file:
+        speaker = json.load(file)
+    if args.ref_file is not None:
+        print("Computing the latents for a new reference...")
+        speaker = get_speaker(args.ref_file, args.server_url)
+    audio = stream_ffplay(
+        tts(
+            args.text,
+            speaker,
+            args.language,
+            args.server_url,
+            args.stream_chunk_size
+        ),
+        args.output_file,
+        save=bool(args.output_file)
+    )