Audio
Collection
Dhivehi Voice AI Collection: Tools for Thaana speech recognition (ASR), text-to-speech (TTS), and audio processing
•
32 items
•
Updated
•
1
This is a fine-tuned version of nari-labs/Dia-1.6B
specifically trained for Dhivehi (Maldivian) text-to-speech synthesis.
# Install Dia library first:
# pip install git+https://github.com/nari-labs/dia.git
# pip install soundfile
from dia.model import Dia
import soundfile as sf
import torch
print("🎤 Testing Dhivehi Dia TTS model...")
try:
# Load your fine-tuned model
print("📥 Loading model from HuggingFace...")
model = Dia.from_pretrained("alakxender/Dia-1.6B-dhivehi-18k")
print("✓ Model loaded successfully!")
# Test texts - Basic samples
test_samples = {
# Basic samples
"basic_english": "Hello, this is a test.",
"basic_dhivehi": "އައްސަލާމް ޢަލައިކުމް، މިއީ ވަކި ޓެސްޓެކެވެ.",
# Mixed language tests
"mixed_greeting": "Hello އައްސަލާމް ޢަލައިކުމް، how are you? ހާލު ކިހިނެއް؟",
# Emotional expressions and sounds
"with_laughter": "That was so funny! (laughs) ވަރަށް މަޖާ އެނގޭ! (laughs) I can't stop laughing!",
# Complex emotional scenarios
"happy_announcement": "(laughs) Guess what? ބަލާ! I got the job! އަހަރެން ވަޒީފާ ލިބުނު! (claps) (claps) (laughs)",
"achievement": "After years of hard work... (claps) finally! އެންމެ ފަހުން! I graduated! އަހަރެން ފުރިހަމަ ކުރީ! (claps) (claps) (laughs)"
}
print("\n🗣️ Generating speech samples...")
generated_files = []
for name, text in test_samples.items():
try:
print(f"🎤 Generating: {name}")
print(f" Text: {text[:60]}{'...' if len(text) > 60 else ''}")
output = model.generate(text)
filename = f"{name}.wav"
sf.write(filename, output, 44100)
generated_files.append((filename, len(output)))
print(f" ✓ Saved: {filename} ({len(output)/44100:.2f}s)")
except Exception as e:
print(f" ❌ Failed to generate {name}: {e}")
print(f"\n🎉 TTS generation completed!")
print(f"📁 Generated {len(generated_files)} audio files:")
total_duration = 0
for filename, samples in generated_files:
duration = samples / 44100
total_duration += duration
print(f" - {filename:<25} ({duration:.2f}s)")
print(f"\n📊 Total audio generated: {total_duration:.2f} seconds")
except ImportError as e:
print("❌ Missing dependencies. Please install:")
print(" pip install git+https://github.com/nari-labs/dia.git")
print(" pip install soundfile")
print(f" Error: {e}")
except Exception as e:
print(f"❌ Error during TTS generation: {e}")
print("💡 Make sure the model was uploaded correctly and is accessible")
This model has been specifically fine-tuned for Dhivehi speech synthesis, providing natural-sounding speech generation for Dhivehi text input.
Note: This was stopped at step 18k, find the full run at alakxender/Dia-1.6B-dhivehi-ep1
This model is released under the Apache 2.0 License, following the original Dia model licensing.
Base model
nari-labs/Dia-1.6B