Whisper Large v3 Turbo - CTranslate2

This is a CTranslate2-optimized version of OpenAI's Whisper Large v3 Turbo model for automatic speech recognition (ASR).

Model Description

This model is a converted version of the original Whisper Large v3 Turbo model, optimized for inference using CTranslate2. CTranslate2 is a C++ and Python library for efficient inference with Transformer models, providing:

  • Faster inference: Optimized implementations of attention mechanisms and feed-forward networks
  • Lower memory usage: Quantization support and memory-efficient attention
  • Better throughput: Batching and parallel processing optimizations
  • Cross-platform compatibility: Support for CPU and GPU inference

Conversion

This model has been converted using the following command:

ct2-transformers-converter --model openai/whisper-large-v3-turbo --output_dir whisper-large-v3-turbo-ct2-int8 --quantization int8 --copy_files tokenizer.json preprocessor_config.json

The conversion includes int8 quantization, which provides several benefits:

  • Reduced disk space: Significantly smaller model size compared to the original float32 version
  • Lower memory consumption: Requires less RAM during inference
  • Maintained accuracy: Minimal quality loss while providing substantial efficiency gains
  • Faster loading: Reduced time to load the model from disk

Original Model

This model is based on OpenAI's Whisper Large v3 Turbo, which is a state-of-the-art automatic speech recognition model that:

  • Supports 99 languages
  • Provides high-quality transcription and translation
  • Features improved accuracy and speed compared to previous Whisper versions
  • Handles various audio conditions and accents

Usage

To use this model, you'll need to install CTranslate2 and the appropriate Whisper integration (faster-whisper):

pip install ctranslate2 faster-whisper
from faster_whisper import WhisperModel

model_size = "path/to/whisper-large-v3-turbo-ct2"
model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.wav", beam_size=5)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Performance

This CTranslate2 version provides significant performance improvements over the original PyTorch implementation:

  • Up to 4x faster inference
  • Reduced memory consumption
  • Support for quantization
  • Optimized for both CPU and GPU inference

Supported Languages

Same as the original Whisper Large v3 Turbo: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh.

Model Card

  • Developed by: OpenAI (original), converted to CT2 format
  • Model type: Automatic Speech Recognition
  • Language(s): Multilingual (99 languages)
  • License: MIT
  • Model size: Large (1550M parameters)
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Kernicterus/whisper-large-v3-turbo-ct2-int8

Finetuned
(317)
this model