File size: 3,420 Bytes
1e4085e
 
 
 
9cbe55d
 
 
 
 
 
1e4085e
 
9cbe55d
1e4085e
9cbe55d
1e4085e
430ee33
1e4085e
ab1a2dd
1e4085e
 
9cbe55d
1e4085e
9cbe55d
 
 
 
1e4085e
9cbe55d
1e4085e
9cbe55d
 
 
 
 
 
1e4085e
ab1a2dd
 
 
 
 
 
 
 
1e4085e
9cbe55d
1e4085e
9cbe55d
1e4085e
9cbe55d
 
1e4085e
9cbe55d
1e4085e
9cbe55d
 
 
 
1e4085e
9cbe55d
1e4085e
9cbe55d
 
 
 
1e4085e
9cbe55d
1e4085e
9cbe55d
1e4085e
a6752d2
1e4085e
9cbe55d
1e4085e
d55bf59
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
library_name: transformers
tags:
- unsloth
license: cc-by-nc-4.0
language:
- de
base_model:
- canopylabs/orpheus-3b-0.1-ft
pipeline_tag: text-to-speech
---

# SauerkrautTTS-Preview-0.1

![SauerkrautTTS-Preview-0.1](https://vago-solutions.ai/wp-content/uploads/2025/03/SauerkrautTTS.png)

**SauerkrautTTS-Preview-0.1** is a fine-tuned Text-to-Speech (TTS) model based on the powerful [canopylabs/orpheus-3b-0.1-ft](https://huggingface.co/canopylabs/orpheus-3b-0.1-ft). 

This preview model introduces four distinct German-speaking voices—**Lena**, **Anna**, **Max**, and **Tom**-crafted using original audio recordings captured with a Rhode Studio microphone and Mimic Studio, alongside carefully curated synthetic data. The high quality and careful curation of our overall dataset enable the model to produce clear and natural speech outputs, even from this initial release.


## Model Details

- **Base Model:** [canopylabs/orpheus-3b-0.1-ft](https://huggingface.co/canopylabs/orpheus-3b-0.1-ft)
- **Languages Supported:** German
- **License:** CC BY 4.0
- **Model Version:** Preview 0.1 (initial release)

## Speaker Data Breakdown

| Speaker | Original Data (Hours) | Synthetic Data (Hours) | Total (Hours) |
|---------|-----------------------|------------------------|---------------|
| Tom     | 1h                    | 3.8h                   | 4.8h          |
| Anna    | 3h                    | 1.25h                  | 4.25h         |
| Max     | -                     | 4.78h                  | 4.78h         |
| Lena    | -                     | 4.87h                  | 4.87h         |

The synthetic audio data enriches the model, resulting in versatile and expressive voice capabilities.

# Example Output and Comparison

<video width="600" controls>
  <source src="https://vago-solutions.ai/wp-content/uploads/2025/03/SauerkrautTTS.mp4" type="video/mp4">
  Dein Browser unterstützt keine Videoanzeige.
</video>

## Usage

For seamless inference and practical examples, check out our detailed instructions and ready-to-use scripts available on:

- **Colab Notebook:** [Orpheus Colab](https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing)
- **GitHub Repository:** [Orpheus GitHub](https://github.com/canopyai/Orpheus-TTS)

To achieve optimal results, we recommend using a **lower temperature** for clear and stable outputs. Higher temperatures will enhance dynamism and expressiveness but might introduce instability.

Example inference settings:
```python
temperature = 0.5  # Adjust lower for clearer output, higher for creativity
```

## Future Plans

This model represents our first exploratory step into advanced German-language TTS. Expect significant improvements in upcoming versions, including:
- Enhanced voice clarity
- Expanded speaker diversity
- Greater stability across temperature ranges

Stay tuned for future releases and updates!

## License

SauerkrautTTS-Preview-0.1 is openly available under the [CC BY-NC 4.0 License](https://creativecommons.org/licenses/by-nc/4.0/), encouraging reuse, remixing, and improvements by the community.

## Acknowledgments

We thank [Unsloth](https://unsloth.ai) for their invaluable training script, which we utilized in a lightly modified form for training this model.
Also we are thankful for the German Ministry of Education and Research (BMBF) for funding our Project ARGUS, in which we developed SauerkrautTTS.