XTTS v2 Mobile - TorchScript Edition

✨ UPDATED: Now with proper TorchScript models ready for mobile deployment!

Optimized XTTS v2 models exported to TorchScript format for direct mobile deployment on Android and iOS devices.

🎯 Key Features

TorchScript Format: Self-contained .ts files that run directly on mobile
Optimized for Mobile: Models processed with PyTorch Mobile optimizations
Multiple Variants: Choose based on your device capabilities
17 Languages: Full multilingual support maintained
24kHz Output: High-quality audio generation

📦 Model Variants

Variant	Size	Memory	Target Devices	Quality
Original	1.16 GB	~1.5GB	High-end (4GB+ RAM)	Best
FP16	581 MB	~800MB	Mid-range (3GB+ RAM)	Excellent

Recommendation: Use FP16 variant for most devices - it offers the best balance of size, memory usage, and quality.

🚀 Quick Start

Download Models

from huggingface_hub import hf_hub_download

# Download FP16 variant (recommended)
model_path = hf_hub_download(
    repo_id="GenMedLabs/xtts-mobile",
    filename="fp16/xtts_infer_fp16.ts"
)

Android Integration (Kotlin)

// Add to build.gradle
dependencies {
    implementation 'org.pytorch:pytorch_android_lite:2.1.0'
}

// Load and use model
class XTTSModule(context: Context) {
    private var module: Module? = null

    fun initialize(modelPath: String) {
        module = Module.load(modelPath)
    }

    fun generateSpeech(text: String, language: String): FloatArray {
        val output = module?.forward(
            IValue.from(text),
            IValue.from(language)
        )?.toTensor()

        return output?.dataAsFloatArray ?: floatArrayOf()
    }
}

iOS Integration (Swift)

import LibTorch

class XTTSModule {
    private var module: TorchModule?

    func initialize(modelPath: String) {
        module = TorchModule(fileAtPath: modelPath)
    }

    func generateSpeech(text: String, language: String) -> [Float] {
        guard let module = module else { return [] }

        let output = module.forward([text, language])
        return output.toArray()
    }
}

React Native Integration

// Download model from HuggingFace
const HF_BASE = "https://huggingface.co/GenMedLabs/xtts-mobile/resolve/main";

async function downloadModel(variant = 'fp16') {
    const url = `${HF_BASE}/${variant}/xtts_infer_${variant}.ts?download=true`;
    const destPath = `${RNFS.DocumentDirectoryPath}/xtts_model.ts`;

    await RNFS.downloadFile({
        fromUrl: url,
        toFile: destPath,
        background: true
    }).promise;

    return destPath;
}

// Initialize native module
const modelPath = await downloadModel('fp16');
await XTTSModule.initialize(modelPath);

// Generate speech
const audio = await XTTSModule.speak("Hello world", "en");

📊 Memory Requirements

Device RAM	Recommended Variant	Expected Performance
< 3GB	FP16 with streaming	May require optimization
3-4GB	FP16	Smooth performance
4GB+	Original or FP16	Excellent performance

🌍 Supported Languages

en - English
es - Spanish
fr - French
de - German
it - Italian
pt - Portuguese
pl - Polish
tr - Turkish
ru - Russian
nl - Dutch
cs - Czech
ar - Arabic
zh - Chinese
ja - Japanese
ko - Korean
hu - Hungarian
hi - Hindi

🔧 Technical Details

Model Architecture: XTTS v2 with GPT-style backbone
Export Method: TorchScript with mobile optimizations
PyTorch Version: 2.8.0 (use matching LibTorch version)
Sample Rate: 24,000 Hz
Quantization: FP16 uses half-precision floating point

💡 Tips for Mobile Deployment

Memory Management:
- Load model once at app startup
- Keep model in memory for multiple generations
- Use module.setNumThreads(1) to reduce memory usage
Performance Optimization:
- Warm up model with dummy input on first load
- Use FP16 variant for best balance
- Consider chunking long texts

Error Handling:

try {
    module = Module.load(modelPath)
} catch (e: Exception) {
    // Fall back to server-side TTS
    Log.e("XTTS", "Failed to load model: ${e.message}")
}

📝 Changelog

2024-09-23: Initial release with TorchScript models
- Added Original and FP16 variants
- Optimized for PyTorch Mobile
- Fixed compatibility issues

📄 License

Apache 2.0

🙏 Acknowledgments

Based on the official XTTS v2 model. Optimized for mobile deployment.

📚 Citation

@misc{xtts2024mobile,
  title={XTTS v2 Mobile - TorchScript Edition},
  author={GenMedLabs},
  year={2024},
  publisher={HuggingFace}
}

⚠️ Important Notes

These are TorchScript models (.ts files), not PyTorch checkpoints (.pth)
Models are self-contained and include all necessary weights
No additional tokenizer files needed - tokenization is built into the model
INT8 quantization not available for ARM-based systems

Downloads last month: 5