πŸ—£οΈ ssml-break2ssml-fr-lora

This is the second-stage LoRA adapter for French SSML generation, converting pause-annotated text into full SSML markup with <break> tags.

This model is part of the cascade described in the paper:

"Improving French Synthetic Speech Quality via SSML Prosody Control" Nassima Ould-Ouali, Γ‰ric Moulines – ICNLSP 2025 (Springer LNCS) [accepted].


🧠 Model Details

  • Base model: Qwen/Qwen2.5-7B
  • Adapter method: LoRA (Low-Rank Adaptation via peft)
  • LoRA rank: 8 β€” Alpha: 16
  • Training: 5 epochs, batch size 1 (gradient accumulation)
  • Languages: French
  • Model size: 7B (adapter-only)
  • License: Apache 2.0

🧩 Pipeline Overview

This model is part of a two-stage SSML cascade for improving French TTS prosody:

Step Model Description
1️⃣ nassimaODL/ssml-text2breaks-fr-lora Inserts symbolic pauses like #250, #500
2️⃣ nassimaODL/ssml-break2ssml-fr-lora Converts symbols to <break time="..."/> SSML

✨ Example

Input:  Bonjour#250 comment vas-tu ?
Output: Bonjour<break time="250ms"/> comment vas-tu ?

πŸš€ How to run the code


from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B", device_map="auto")
model = PeftModel.from_pretrained(base_model, "nassimaODL/ssml-break2ssml-fr-lora")

input_text = "Bonjour#250 comment vas-tu ?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ§ͺ Evaluation Summary

Metric Value
Pause Insertion Accuracy 87.3%
RMSE (pause duration) 98.5 ms
MOS gain (vs. baseline) +0.42

Evaluation was performed on a held-out French validation set with annotated SSML pauses. Mean Opinion Score (MOS) improvements were assessed using TTS outputs rendered with Azure Henri voice and rated by 30 native French speakers.


πŸ“š Training Data

This LoRA adapter was trained on a corpus of ~4,500 French utterances. Input texts were annotated with symbolic pause indicators (e.g., #250 for 250ms), automatically aligned using a combination of Whisper-Kyutai timestamping and F0/syntactic heuristics.

Annotations were refined via a hybrid heuristic rule set combining:

  • Voice activity boundaries (via Auditok)
  • F0 contour analysis (pitch dips before breaks)
  • Syntactic cues (punctuation, conjunctions)

For full details, see our data preparation pipeline on GitHub:
πŸ”— https://github.com/NassimaOULDOUALI/Prosody-Control-French-TTS


βš™οΈ Training Setup

  • Compute: Jean-Zay (GENCI/IDRIS), A100 80GB x1
  • Framework: HuggingFace transformers + peft
  • LoRA method: rank = 8, alpha = 16, dropout = 0.05
  • Precision: bf16
  • Max sequence length: 768 tokens (256 input + 512 output)
  • Epochs: 5
  • Optimizer: AdamW (lr = 2e-4, no warmup)
  • LoRA target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training was performed using the Unsloth SFTTrainer and PEFT adapter injection on Qwen2.5-7B base.


⚠️ Limitations

  • Only <break> tags are supported; no pitch, rate, or emphasis control yet.
  • Pause accuracy is sensitive to punctuation and malformed inputs.
  • SSML output has been optimized primarily for Azure voices (e.g., fr-FR-HenriNeural). Other engines may interpret <break> tags differently.
  • The model assumes the presence of symbolic pause markers in the input (e.g., #250). For automatic prediction of such symbols, refer to our stage-1 model:
    πŸ”— nassimaODL/ssml-text2breaks-fr-lora

πŸ“– Citation

@inproceedings{ould-ouali2025improving, author = {Nassima Ould-Ouali and Awais Sani and Tim Luka Horstmann and Jonah Dauvet and Ruben Bueno and Γ‰ric Moulines}, title = {Improving French Synthetic Speech Quality via SSML Prosody Control}, booktitle = {Proceedings of the 9th International Conference on Natural Language and Speech Processing (ICNLSP)}, series = {Lecture Notes in Computer Science}, publisher = {Springer}, year = {2025}, note = {To appear} }

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nassimaODL/ssml-breaks2ssml-fr-lora

Base model

Qwen/Qwen2.5-7B
Adapter
(384)
this model