Update README.md
Browse files
README.md
CHANGED
|
@@ -1,78 +1,89 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
-
|
| 10 |
-
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
-
|
| 75 |
-
-
|
| 76 |
-
-
|
| 77 |
-
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model:
|
| 4 |
+
- openai/whisper-large-v3-turbo
|
| 5 |
+
tags:
|
| 6 |
+
- whisper
|
| 7 |
+
- faster
|
| 8 |
+
- int8
|
| 9 |
+
- ct2
|
| 10 |
+
- turbo
|
| 11 |
+
---
|
| 12 |
+
# Whisper Large v3 Turbo - CTranslate2
|
| 13 |
+
|
| 14 |
+
This is a CTranslate2-optimized version of OpenAI's Whisper Large v3 Turbo model for automatic speech recognition (ASR).
|
| 15 |
+
|
| 16 |
+
## Model Description
|
| 17 |
+
|
| 18 |
+
This model is a converted version of the original Whisper Large v3 Turbo model, optimized for inference using CTranslate2. CTranslate2 is a C++ and Python library for efficient inference with Transformer models, providing:
|
| 19 |
+
|
| 20 |
+
- **Faster inference**: Optimized implementations of attention mechanisms and feed-forward networks
|
| 21 |
+
- **Lower memory usage**: Quantization support and memory-efficient attention
|
| 22 |
+
- **Better throughput**: Batching and parallel processing optimizations
|
| 23 |
+
- **Cross-platform compatibility**: Support for CPU and GPU inference
|
| 24 |
+
|
| 25 |
+
## Conversion
|
| 26 |
+
|
| 27 |
+
This model has been converted using the following command:
|
| 28 |
+
|
| 29 |
+
```bash
|
| 30 |
+
ct2-transformers-converter --model openai/whisper-large-v3-turbo --output_dir whisper-large-v3-turbo-ct2-int8 --quantization int8 --copy_files tokenizer.json preprocessor_config.json
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
The conversion includes **int8 quantization**, which provides several benefits:
|
| 34 |
+
|
| 35 |
+
- **Reduced disk space**: Significantly smaller model size compared to the original float32 version
|
| 36 |
+
- **Lower memory consumption**: Requires less RAM during inference
|
| 37 |
+
- **Maintained accuracy**: Minimal quality loss while providing substantial efficiency gains
|
| 38 |
+
- **Faster loading**: Reduced time to load the model from disk
|
| 39 |
+
|
| 40 |
+
## Original Model
|
| 41 |
+
|
| 42 |
+
This model is based on OpenAI's Whisper Large v3 Turbo, which is a state-of-the-art automatic speech recognition model that:
|
| 43 |
+
|
| 44 |
+
- Supports 99 languages
|
| 45 |
+
- Provides high-quality transcription and translation
|
| 46 |
+
- Features improved accuracy and speed compared to previous Whisper versions
|
| 47 |
+
- Handles various audio conditions and accents
|
| 48 |
+
|
| 49 |
+
## Usage
|
| 50 |
+
|
| 51 |
+
To use this model, you'll need to install CTranslate2 and the appropriate Whisper integration (faster-whisper):
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
pip install ctranslate2 faster-whisper
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
```python
|
| 58 |
+
from faster_whisper import WhisperModel
|
| 59 |
+
|
| 60 |
+
model_size = "path/to/whisper-large-v3-turbo-ct2"
|
| 61 |
+
model = WhisperModel(model_size, device="cpu", compute_type="int8")
|
| 62 |
+
|
| 63 |
+
segments, info = model.transcribe("audio.wav", beam_size=5)
|
| 64 |
+
|
| 65 |
+
for segment in segments:
|
| 66 |
+
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## Performance
|
| 70 |
+
|
| 71 |
+
This CTranslate2 version provides significant performance improvements over the original PyTorch implementation:
|
| 72 |
+
|
| 73 |
+
- Up to 4x faster inference
|
| 74 |
+
- Reduced memory consumption
|
| 75 |
+
- Support for quantization
|
| 76 |
+
- Optimized for both CPU and GPU inference
|
| 77 |
+
|
| 78 |
+
## Supported Languages
|
| 79 |
+
|
| 80 |
+
Same as the original Whisper Large v3 Turbo:
|
| 81 |
+
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh.
|
| 82 |
+
|
| 83 |
+
## Model Card
|
| 84 |
+
|
| 85 |
+
- **Developed by**: OpenAI (original), converted to CT2 format
|
| 86 |
+
- **Model type**: Automatic Speech Recognition
|
| 87 |
+
- **Language(s)**: Multilingual (99 languages)
|
| 88 |
+
- **License**: MIT
|
| 89 |
+
- **Model size**: Large (1550M parameters)
|