XTTS v2 Mobile - TorchScript Edition

✨ UPDATED: Now with proper TorchScript models ready for mobile deployment!

Optimized XTTS v2 models exported to TorchScript format for direct mobile deployment on Android and iOS devices.

🎯 Key Features

  • TorchScript Format: Self-contained .ts files that run directly on mobile
  • Optimized for Mobile: Models processed with PyTorch Mobile optimizations
  • Multiple Variants: Choose based on your device capabilities
  • 17 Languages: Full multilingual support maintained
  • 24kHz Output: High-quality audio generation

πŸ“¦ Model Variants

Variant Size Memory Target Devices Quality
Original 1.16 GB ~1.5GB High-end (4GB+ RAM) Best
FP16 581 MB ~800MB Mid-range (3GB+ RAM) Excellent

Recommendation: Use FP16 variant for most devices - it offers the best balance of size, memory usage, and quality.

πŸš€ Quick Start

Download Models

from huggingface_hub import hf_hub_download

# Download FP16 variant (recommended)
model_path = hf_hub_download(
    repo_id="GenMedLabs/xtts-mobile",
    filename="fp16/xtts_infer_fp16.ts"
)

Android Integration (Kotlin)

// Add to build.gradle
dependencies {
    implementation 'org.pytorch:pytorch_android_lite:2.1.0'
}

// Load and use model
class XTTSModule(context: Context) {
    private var module: Module? = null

    fun initialize(modelPath: String) {
        module = Module.load(modelPath)
    }

    fun generateSpeech(text: String, language: String): FloatArray {
        val output = module?.forward(
            IValue.from(text),
            IValue.from(language)
        )?.toTensor()

        return output?.dataAsFloatArray ?: floatArrayOf()
    }
}

iOS Integration (Swift)

import LibTorch

class XTTSModule {
    private var module: TorchModule?

    func initialize(modelPath: String) {
        module = TorchModule(fileAtPath: modelPath)
    }

    func generateSpeech(text: String, language: String) -> [Float] {
        guard let module = module else { return [] }

        let output = module.forward([text, language])
        return output.toArray()
    }
}

React Native Integration

// Download model from HuggingFace
const HF_BASE = "https://huggingface.co/GenMedLabs/xtts-mobile/resolve/main";

async function downloadModel(variant = 'fp16') {
    const url = `${HF_BASE}/${variant}/xtts_infer_${variant}.ts?download=true`;
    const destPath = `${RNFS.DocumentDirectoryPath}/xtts_model.ts`;

    await RNFS.downloadFile({
        fromUrl: url,
        toFile: destPath,
        background: true
    }).promise;

    return destPath;
}

// Initialize native module
const modelPath = await downloadModel('fp16');
await XTTSModule.initialize(modelPath);

// Generate speech
const audio = await XTTSModule.speak("Hello world", "en");

πŸ“Š Memory Requirements

Device RAM Recommended Variant Expected Performance
< 3GB FP16 with streaming May require optimization
3-4GB FP16 Smooth performance
4GB+ Original or FP16 Excellent performance

🌍 Supported Languages

  • en - English
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese
  • pl - Polish
  • tr - Turkish
  • ru - Russian
  • nl - Dutch
  • cs - Czech
  • ar - Arabic
  • zh - Chinese
  • ja - Japanese
  • ko - Korean
  • hu - Hungarian
  • hi - Hindi

πŸ”§ Technical Details

  • Model Architecture: XTTS v2 with GPT-style backbone
  • Export Method: TorchScript with mobile optimizations
  • PyTorch Version: 2.8.0 (use matching LibTorch version)
  • Sample Rate: 24,000 Hz
  • Quantization: FP16 uses half-precision floating point

πŸ’‘ Tips for Mobile Deployment

  1. Memory Management:

    • Load model once at app startup
    • Keep model in memory for multiple generations
    • Use module.setNumThreads(1) to reduce memory usage
  2. Performance Optimization:

    • Warm up model with dummy input on first load
    • Use FP16 variant for best balance
    • Consider chunking long texts
  3. Error Handling:

    try {
        module = Module.load(modelPath)
    } catch (e: Exception) {
        // Fall back to server-side TTS
        Log.e("XTTS", "Failed to load model: ${e.message}")
    }
    

πŸ“ Changelog

  • 2024-09-23: Initial release with TorchScript models
    • Added Original and FP16 variants
    • Optimized for PyTorch Mobile
    • Fixed compatibility issues

πŸ“„ License

Apache 2.0

πŸ™ Acknowledgments

Based on the official XTTS v2 model. Optimized for mobile deployment.

πŸ“š Citation

@misc{xtts2024mobile,
  title={XTTS v2 Mobile - TorchScript Edition},
  author={GenMedLabs},
  year={2024},
  publisher={HuggingFace}
}

⚠️ Important Notes

  • These are TorchScript models (.ts files), not PyTorch checkpoints (.pth)
  • Models are self-contained and include all necessary weights
  • No additional tokenizer files needed - tokenization is built into the model
  • INT8 quantization not available for ARM-based systems
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support