Spaces:

RSHVR
/

Command_RTC

Sleeping

App Files Files Community

RSHVR commited on Mar 30

Commit

12d303c

verified ·

1 Parent(s): f3c69f5

Update README.md

Browse files

Files changed (1) hide show

README.md +94 -1

README.md CHANGED Viewed

@@ -9,6 +9,99 @@ app_file: app.py
 pinned: false
 license: apache-2.0
 short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 pinned: false
 license: apache-2.0
 short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
+tags:
+  - tortoise-tts
+  - text-to-speech
+  - voice-cloning
+  - gradio
+  - fastapi
 ---
+# Tortoise TTS with Voice Cloning
+A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.
+## Description
+This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:
+- Uploading your own voice sample for cloning
+- Recording your voice directly in the browser
+- Selecting from a variety of preset voices
+The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.
+## How to Use
+### Web Interface
+1. Enter the text you want to convert to speech
+2. Choose one of the following voice options:
+   - Upload a voice sample audio file (WAV format recommended)
+   - Record your voice using your microphone
+   - Select a preset voice from the dropdown menu
+3. Click "Generate Speech"
+4. Listen to or download the generated audio
+### API Endpoints
+The app also provides REST API endpoints for programmatic access:
+1. **Voice File TTS** - `/api/tts_with_voice_file/`
+   - POST request with:
+     - `text`: Text to convert to speech (required)
+     - `voice_file`: Audio file for voice cloning (optional)
+     - `preset_voice`: Name of preset voice (optional, defaults to "random")
+2. **Preset Voice TTS** - `/api/tts_with_preset/`
+   - POST request with:
+     - `text`: Text to convert to speech (required)
+     - `preset_voice`: Name of preset voice (required)
+### Python Example
+```python
+import requests
+# Using preset voice
+response = requests.post(
+    "https://your-space-name.hf.space/api/tts_with_preset/",
+    data={"text": "Hello, this is a test.", "preset_voice": "tom"}
+)
+# Save the audio file
+with open("output.wav", "wb") as f:
+    f.write(response.content)
+```
+## Technical Details
+This app leverages:
+- **Tortoise-TTS**: State-of-the-art text-to-speech model
+- **Gradio**: For the intuitive user interface
+- **FastAPI**: For the API endpoints
+- **Zero-GPU**: For efficient GPU utilization on Hugging Face Spaces
+## Limitations
+- Text generation may take some time (30-60 seconds) depending on text length
+- Voice cloning quality depends on the clarity and length of the provided sample
+- For best results, provide voice samples with clear speech and minimal background noise
+## Credits
+This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:
+```
+@misc{tortoise-tts,
+  author = {James Betker},
+  title = {Tortoise-TTS: A Multi-Voice TTS System},
+  year = {2022},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
+}
+```
+## License
+This project is available under the Apache-2.0 License.