Whisper Large v3 Turbo - CTranslate2

This is a CTranslate2-optimized version of OpenAI's Whisper Large v3 Turbo model for automatic speech recognition (ASR).

Model Description

This model is a converted version of the original Whisper Large v3 Turbo model, optimized for inference using CTranslate2. CTranslate2 is a C++ and Python library for efficient inference with Transformer models, providing:

Faster inference: Optimized implementations of attention mechanisms and feed-forward networks
Lower memory usage: Quantization support and memory-efficient attention
Better throughput: Batching and parallel processing optimizations
Cross-platform compatibility: Support for CPU and GPU inference

Conversion

This model has been converted using the following command:

ct2-transformers-converter --model openai/whisper-large-v3-turbo --output_dir whisper-large-v3-turbo-ct2-int8 --quantization int8 --copy_files tokenizer.json preprocessor_config.json

The conversion includes int8 quantization, which provides several benefits:

Reduced disk space: Significantly smaller model size compared to the original float32 version
Lower memory consumption: Requires less RAM during inference
Maintained accuracy: Minimal quality loss while providing substantial efficiency gains
Faster loading: Reduced time to load the model from disk

Original Model

This model is based on OpenAI's Whisper Large v3 Turbo, which is a state-of-the-art automatic speech recognition model that:

Supports 99 languages
Provides high-quality transcription and translation
Features improved accuracy and speed compared to previous Whisper versions
Handles various audio conditions and accents

Usage

To use this model, you'll need to install CTranslate2 and the appropriate Whisper integration (faster-whisper):

pip install ctranslate2 faster-whisper

from faster_whisper import WhisperModel

model_size = "path/to/whisper-large-v3-turbo-ct2"
model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.wav", beam_size=5)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Performance

This CTranslate2 version provides significant performance improvements over the original PyTorch implementation:

Up to 4x faster inference
Reduced memory consumption
Support for quantization
Optimized for both CPU and GPU inference

Supported Languages

Same as the original Whisper Large v3 Turbo: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh.

Model Card

Developed by: OpenAI (original), converted to CT2 format
Model type: Automatic Speech Recognition
Language(s): Multilingual (99 languages)
License: MIT
Model size: Large (1550M parameters)

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kernicterus/whisper-large-v3-turbo-ct2-int8

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

(376)

this model