Spaces:
Running
on
Zero
Running
on
Zero
Update README.md
Browse files
README.md
CHANGED
|
@@ -4,10 +4,125 @@ emoji: π
|
|
| 4 |
colorFrom: green
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version: 5.
|
| 8 |
app_file: app.py
|
| 9 |
pinned: true
|
| 10 |
short_description: mcp_server
|
| 11 |
---
|
|
|
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
colorFrom: green
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 5.35.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: true
|
| 10 |
short_description: mcp_server
|
| 11 |
---
|
| 12 |
+
Looking at this code, it's a Text-to-Speech (TTS) application using the Zonos model. Let me provide explanations in both English and Korean.
|
| 13 |
|
| 14 |
+
## English Explanation
|
| 15 |
+
|
| 16 |
+
### Overview
|
| 17 |
+
This is a Gradio-based web application for the **Zonos Text-to-Speech (TTS) Generator**. Zonos is an advanced TTS model from Zyphra that can generate natural-sounding speech with customizable voice characteristics.
|
| 18 |
+
|
| 19 |
+
### Key Features
|
| 20 |
+
|
| 21 |
+
1. **Model Selection**
|
| 22 |
+
- Two model variants: Transformer and Hybrid
|
| 23 |
+
- Different models have different conditioning capabilities
|
| 24 |
+
|
| 25 |
+
2. **Text Input & Language Support**
|
| 26 |
+
- Supports multiple languages through eSpeak phoneme conversion
|
| 27 |
+
- Text length limit of 500 characters
|
| 28 |
+
- Language selection from supported language codes
|
| 29 |
+
|
| 30 |
+
3. **Voice Customization**
|
| 31 |
+
- **Speaker Cloning**: Upload audio to clone a specific voice
|
| 32 |
+
- **Voice Quality Settings**:
|
| 33 |
+
- DNS-MOS (Voice Quality): 1.0-5.0 scale
|
| 34 |
+
- Frequency Max: Control the highest frequency in Hz
|
| 35 |
+
- Voice Clarity: Adjust voice intelligibility
|
| 36 |
+
- Pitch Variation: Control how much the pitch varies
|
| 37 |
+
- Speaking Rate: Adjust speech speed
|
| 38 |
+
|
| 39 |
+
4. **Emotion Control**
|
| 40 |
+
- 8 emotion sliders: Happiness, Sadness, Disgust, Fear, Surprise, Anger, Other, Neutral
|
| 41 |
+
- Fine-tune emotional expression in the generated speech
|
| 42 |
+
|
| 43 |
+
5. **Advanced Generation Parameters**
|
| 44 |
+
- **Guidance Scale**: Controls how closely the model follows the conditioning
|
| 45 |
+
- **Min P**: Controls randomness/creativity in generation
|
| 46 |
+
- **Seed**: For reproducible results
|
| 47 |
+
- **Prefix Audio**: Continue generation from existing audio
|
| 48 |
+
|
| 49 |
+
6. **Unconditional Generation**
|
| 50 |
+
- Toggle specific conditions to let the model generate them automatically
|
| 51 |
+
- Useful for more creative/varied outputs
|
| 52 |
+
|
| 53 |
+
### Technical Details
|
| 54 |
+
- Uses GPU acceleration via CUDA
|
| 55 |
+
- Implements classifier-free guidance for better control
|
| 56 |
+
- Supports audio continuation from prefix
|
| 57 |
+
- Real-time progress tracking during generation
|
| 58 |
+
|
| 59 |
+
### How to Use
|
| 60 |
+
1. Select a model variant
|
| 61 |
+
2. Enter your text and choose language
|
| 62 |
+
3. (Optional) Upload speaker audio for voice cloning
|
| 63 |
+
4. Adjust voice characteristics and emotions
|
| 64 |
+
5. Click "Generate Audio" to create speech
|
| 65 |
+
6. Download or play the generated audio
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
## νκΈ μ€λͺ
|
| 70 |
+
|
| 71 |
+
### κ°μ
|
| 72 |
+
μ΄κ²μ **Zonos ν
μ€νΈ μμ± λ³ν(TTS) μμ±κΈ°**λ₯Ό μν Gradio κΈ°λ° μΉ μ ν리μΌμ΄μ
μ
λλ€. Zonosλ Zyphraμμ κ°λ°ν κ³ κΈ TTS λͺ¨λΈλ‘, μ¬μ©μκ° μμ± νΉμ±μ 컀μ€ν°λ§μ΄μ§νμ¬ μμ°μ€λ¬μ΄ μμ±μ μμ±ν μ μμ΅λλ€.
|
| 73 |
+
|
| 74 |
+
### μ£Όμ κΈ°λ₯
|
| 75 |
+
|
| 76 |
+
1. **λͺ¨λΈ μ ν**
|
| 77 |
+
- λ κ°μ§ λͺ¨λΈ λ³ν: Transformerμ Hybrid
|
| 78 |
+
- κ° λͺ¨λΈλ§λ€ λ€λ₯Έ μ‘°κ±΄λΆ κΈ°λ₯ μ 곡
|
| 79 |
+
|
| 80 |
+
2. **ν
μ€νΈ μ
λ ₯ λ° μΈμ΄ μ§μ**
|
| 81 |
+
- eSpeak μμ λ³νμ ν΅ν λ€κ΅μ΄ μ§μ
|
| 82 |
+
- ν
μ€νΈ κΈΈμ΄ μ ν: 500μ
|
| 83 |
+
- μ§μλλ μΈμ΄ μ½λ μ€ μ ν κ°λ₯
|
| 84 |
+
|
| 85 |
+
3. **μμ± μ»€μ€ν°λ§μ΄μ§**
|
| 86 |
+
- **νμ 볡μ **: νΉμ μμ±μ 볡μ νκΈ° μν μ€λμ€ μ
λ‘λ
|
| 87 |
+
- **μμ± νμ§ μ€μ **:
|
| 88 |
+
- DNS-MOS (μμ± νμ§): 1.0-5.0 μ²λ
|
| 89 |
+
- μ΅λ μ£Όνμ: Hz λ¨μλ‘ μ΅κ³ μ£Όνμ μ μ΄
|
| 90 |
+
- μμ± λͺ
λ£λ: μμ±μ μ΄ν΄λ μ‘°μ
|
| 91 |
+
- μλμ΄ λ³ν: μλμ΄ λ³νλ μ μ΄
|
| 92 |
+
- λ°ν μλ: μμ± μλ μ‘°μ
|
| 93 |
+
|
| 94 |
+
4. **κ°μ μ μ΄**
|
| 95 |
+
- 8κ°μ§ κ°μ μ¬λΌμ΄λ: ν볡, μ¬ν, νμ€, λλ €μ, λλ, λΆλ
Έ, κΈ°ν, μ€λ¦½
|
| 96 |
+
- μμ±λ μμ±μ κ°μ ννμ μΈλ°νκ² μ‘°μ
|
| 97 |
+
|
| 98 |
+
5. **κ³ κΈ μμ± λ§€κ°λ³μ**
|
| 99 |
+
- **κ°μ΄λμ€ μ€μΌμΌ**: λͺ¨λΈμ΄ 쑰건μ μΌλ§λ μΆ©μ€ν λ°λ₯Όμ§ μ μ΄
|
| 100 |
+
- **Min P**: μμ±μ 무μμμ±/μ°½μμ± μ μ΄
|
| 101 |
+
- **μλ**: μ¬ν κ°λ₯ν κ²°κ³Όλ₯Ό μν μ€μ
|
| 102 |
+
- **ν리ν½μ€ μ€λμ€**: κΈ°μ‘΄ μ€λμ€μμ μ΄μ΄μ μμ±
|
| 103 |
+
|
| 104 |
+
6. **λ¬΄μ‘°κ±΄λΆ μμ±**
|
| 105 |
+
- νΉμ 쑰건μ ν κΈνμ¬ λͺ¨λΈμ΄ μλμΌλ‘ μμ±νλλ‘ μ€μ
|
| 106 |
+
- λ μ°½μμ μ΄κ³ λ€μν μΆλ ₯μ μ μ©
|
| 107 |
+
|
| 108 |
+
### κΈ°μ μ μΈλΆμ¬ν
|
| 109 |
+
- CUDAλ₯Ό ν΅ν GPU κ°μ μ¬μ©
|
| 110 |
+
- λ λμ μ μ΄λ₯Ό μν classifier-free guidance ꡬν
|
| 111 |
+
- ν리ν½μ€μμ μ€λμ€ μ°μ μμ± μ§μ
|
| 112 |
+
- μμ± μ€ μ€μκ° μ§ν μν© μΆμ
|
| 113 |
+
|
| 114 |
+
### μ¬μ© λ°©λ²
|
| 115 |
+
1. λͺ¨λΈ λ³ν μ ν
|
| 116 |
+
2. ν
μ€νΈ μ
λ ₯ λ° μΈμ΄ μ ν
|
| 117 |
+
3. (μ νμ¬ν) μμ± λ³΅μ λ₯Ό μν νμ μ€λμ€ μ
λ‘λ
|
| 118 |
+
4. μμ± νΉμ± λ° κ°μ μ‘°μ
|
| 119 |
+
5. "Generate Audio" λ²νΌμ ν΄λ¦νμ¬ μμ± μμ±
|
| 120 |
+
6. μμ±λ μ€λμ€ λ€μ΄λ‘λ λλ μ¬μ
|
| 121 |
+
|
| 122 |
+
### νΉλ³ κΈ°λ₯
|
| 123 |
+
- **κ°μ μ€μ **: μμ±λ μμ±μ κ°μ ν€μ μΈλ°νκ² μ μ΄
|
| 124 |
+
- **μμ± νμ§**: DNS-MOS μ μλ‘ μμ± νμ§ μ‘°μ
|
| 125 |
+
- **νμ λ
Έμ΄μ¦ μ κ±°**: μ
λ‘λλ νμ μ€λμ€μ λ
Έμ΄μ¦ μ κ±° μ΅μ
|
| 126 |
+
- **λ¬΄μ‘°κ±΄λΆ ν€**: νΉμ κΈ°λ₯μ μλμΌλ‘ μμ±νλλ‘ μ€μ
|
| 127 |
+
|
| 128 |
+
μ΄ μ ν리μΌμ΄μ
μ κ³ νμ§ TTS μμ±μ μν κ°λ ₯νκ³ μ μ°ν λꡬλ‘, λ€μν μ©λμ μμ± μ½ν
μΈ μ μμ νμ©ν μ μμ΅λλ€.
|