Dia TTS - Dhivehi Fine-tuned Model

This is a fine-tuned version of nari-labs/Dia-1.6B specifically trained for Dhivehi (Maldivian) text-to-speech synthesis.

Model Description

Base Model: Dia-1.6B
Language: Mixed, Dhivehi (dv)
Task: Text-to-Speech (TTS)
Fine-tuning: Specialized for Dhivehi audio synthesis

Usage

# Install Dia library first:
# pip install git+https://github.com/nari-labs/dia.git
# pip install soundfile

from dia.model import Dia
import soundfile as sf
import torch

print("🎤 Testing Dhivehi Dia TTS model...")

try:
    # Load your fine-tuned model
    print("📥 Loading model from HuggingFace...")
    model = Dia.from_pretrained("alakxender/Dia-1.6B-dhivehi-18k")
    print("✓ Model loaded successfully!")
    
    # Test texts - Basic samples
    test_samples = {
        # Basic samples
        "basic_english": "Hello, this is a test.",
        "basic_dhivehi": "އައްސަލާމް ޢަލައިކުމް، މިއީ ވަކި ޓެސްޓެކެވެ.",
        
        # Mixed language tests
        "mixed_greeting": "Hello އައްސަލާމް ޢަލައިކުމް، how are you? ހާލު ކިހިނެއް؟",

        # Emotional expressions and sounds
        "with_laughter": "That was so funny! (laughs) ވަރަށް މަޖާ އެނގޭ! (laughs) I can't stop laughing!",
        
        # Complex emotional scenarios
        "happy_announcement": "(laughs) Guess what? ބަލާ! I got the job! އަހަރެން ވަޒީފާ ލިބުނު! (claps) (claps) (laughs)",
        "achievement": "After years of hard work... (claps) finally! އެންމެ ފަހުން! I graduated! އަހަރެން ފުރިހަމަ ކުރީ! (claps) (claps) (laughs)"
    }
    
    print("\n🗣️  Generating speech samples...")
    generated_files = []
    
    for name, text in test_samples.items():
        try:
            print(f"🎤 Generating: {name}")
            print(f"   Text: {text[:60]}{'...' if len(text) > 60 else ''}")
            
            output = model.generate(text)
            filename = f"{name}.wav"
            sf.write(filename, output, 44100)
            generated_files.append((filename, len(output)))
            print(f"   ✓ Saved: {filename} ({len(output)/44100:.2f}s)")
            
        except Exception as e:
            print(f"   ❌ Failed to generate {name}: {e}")
    
    print(f"\n🎉 TTS generation completed!")
    print(f"📁 Generated {len(generated_files)} audio files:")
    
    total_duration = 0
    for filename, samples in generated_files:
        duration = samples / 44100
        total_duration += duration
        print(f"   - {filename:<25} ({duration:.2f}s)")
    
    print(f"\n📊 Total audio generated: {total_duration:.2f} seconds")
    
except ImportError as e:
    print("❌ Missing dependencies. Please install:")
    print("   pip install git+https://github.com/nari-labs/dia.git")
    print("   pip install soundfile")
    print(f"   Error: {e}")
    
except Exception as e:
    print(f"❌ Error during TTS generation: {e}")
    print("💡 Make sure the model was uploaded correctly and is accessible")

Training Details

Base Model: nari-labs/Dia-1.6B
Training Data: Dhivehi audio dataset
Fine-tuning Approach: Direct training on Dhivehi audio without language tags
Checkpoint: Step 18,000

Model Performance

This model has been specifically fine-tuned for Dhivehi speech synthesis, providing natural-sounding speech generation for Dhivehi text input.

Note: This was stopped at step 18k, find the full run at alakxender/Dia-1.6B-dhivehi-ep1

Limitations

Optimized specifically for Dhivehi language
May not perform well on other languages
Performance depends on input text quality and pronunciation patterns

License

This model is released under the Apache 2.0 License, following the original Dia model licensing.

alakxender
/

Dia-1.6B-dhivehi-18k

You need to agree to share your contact information to access this model

Dia TTS - Dhivehi Fine-tuned Model

Model Description

Usage

Training Details

Model Performance

Limitations

License

Model tree for alakxender/Dia-1.6B-dhivehi-18k

Dataset used to train alakxender/Dia-1.6B-dhivehi-18k

Space using alakxender/Dia-1.6B-dhivehi-18k 1

Collection including alakxender/Dia-1.6B-dhivehi-18k

Audio