π£οΈ ssml-break2ssml-fr-lora
This is the second-stage LoRA adapter for French SSML generation, converting pause-annotated text into full SSML markup with <break>
tags.
This model is part of the cascade described in the paper:
"Improving French Synthetic Speech Quality via SSML Prosody Control" Nassima Ould-Ouali, Γric Moulines β ICNLSP 2025 (Springer LNCS) [accepted].
π§ Model Details
- Base model:
Qwen/Qwen2.5-7B
- Adapter method: LoRA (Low-Rank Adaptation via
peft
) - LoRA rank: 8 β Alpha: 16
- Training: 5 epochs, batch size 1 (gradient accumulation)
- Languages: French
- Model size: 7B (adapter-only)
- License: Apache 2.0
π§© Pipeline Overview
This model is part of a two-stage SSML cascade for improving French TTS prosody:
Step | Model | Description |
---|---|---|
1οΈβ£ | nassimaODL/ssml-text2breaks-fr-lora |
Inserts symbolic pauses like #250 , #500 |
2οΈβ£ | nassimaODL/ssml-break2ssml-fr-lora |
Converts symbols to <break time="..."/> SSML |
β¨ Example
Input: Bonjour#250 comment vas-tu ?
Output: Bonjour<break time="250ms"/> comment vas-tu ?
π How to run the code
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B", device_map="auto")
model = PeftModel.from_pretrained(base_model, "nassimaODL/ssml-break2ssml-fr-lora")
input_text = "Bonjour#250 comment vas-tu ?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π§ͺ Evaluation Summary
Metric | Value |
---|---|
Pause Insertion Accuracy | 87.3% |
RMSE (pause duration) | 98.5 ms |
MOS gain (vs. baseline) | +0.42 |
Evaluation was performed on a held-out French validation set with annotated SSML pauses. Mean Opinion Score (MOS) improvements were assessed using TTS outputs rendered with Azure Henri voice and rated by 30 native French speakers.
π Training Data
This LoRA adapter was trained on a corpus of ~4,500 French utterances. Input texts were annotated with symbolic pause indicators (e.g., #250
for 250ms), automatically aligned using a combination of Whisper-Kyutai timestamping and F0/syntactic heuristics.
Annotations were refined via a hybrid heuristic rule set combining:
- Voice activity boundaries (via Auditok)
- F0 contour analysis (pitch dips before breaks)
- Syntactic cues (punctuation, conjunctions)
For full details, see our data preparation pipeline on GitHub:
π https://github.com/NassimaOULDOUALI/Prosody-Control-French-TTS
βοΈ Training Setup
- Compute: Jean-Zay (GENCI/IDRIS), A100 80GB x1
- Framework: HuggingFace
transformers
+peft
- LoRA method: rank = 8, alpha = 16, dropout = 0.05
- Precision: bf16
- Max sequence length: 768 tokens (256 input + 512 output)
- Epochs: 5
- Optimizer: AdamW (lr = 2e-4, no warmup)
- LoRA target modules:
q_proj
,k_proj
,v_proj
,o_proj
,gate_proj
,up_proj
,down_proj
Training was performed using the Unsloth SFTTrainer and PEFT adapter injection on Qwen2.5-7B base.
β οΈ Limitations
- Only
<break>
tags are supported; no pitch, rate, or emphasis control yet. - Pause accuracy is sensitive to punctuation and malformed inputs.
- SSML output has been optimized primarily for Azure voices (e.g.,
fr-FR-HenriNeural
). Other engines may interpret<break>
tags differently. - The model assumes the presence of symbolic pause markers in the input (e.g.,
#250
). For automatic prediction of such symbols, refer to our stage-1 model:
πnassimaODL/ssml-text2breaks-fr-lora
π Citation
@inproceedings{ould-ouali2025improving, author = {Nassima Ould-Ouali and Awais Sani and Tim Luka Horstmann and Jonah Dauvet and Ruben Bueno and Γric Moulines}, title = {Improving French Synthetic Speech Quality via SSML Prosody Control}, booktitle = {Proceedings of the 9th International Conference on Natural Language and Speech Processing (ICNLSP)}, series = {Lecture Notes in Computer Science}, publisher = {Springer}, year = {2025}, note = {To appear} }
- Downloads last month
- 6
Model tree for nassimaODL/ssml-breaks2ssml-fr-lora
Base model
Qwen/Qwen2.5-7B