Spaces:
Sleeping
Sleeping
Ryan Metcalfe
commited on
readme : add OpenVINO support details (#1112)
Browse files
README.md
CHANGED
|
@@ -22,6 +22,7 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
|
|
| 22 |
- [Partial GPU support for NVIDIA via cuBLAS](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
|
| 23 |
- [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
|
| 24 |
- [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas)
|
|
|
|
| 25 |
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
|
| 26 |
|
| 27 |
Supported platforms:
|
|
@@ -311,6 +312,85 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
|
|
| 311 |
|
| 312 |
For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggerganov/whisper.cpp/pull/566).
|
| 313 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 314 |
## NVIDIA GPU support via cuBLAS
|
| 315 |
|
| 316 |
With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS.
|
|
|
|
| 22 |
- [Partial GPU support for NVIDIA via cuBLAS](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
|
| 23 |
- [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
|
| 24 |
- [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas)
|
| 25 |
+
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
|
| 26 |
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
|
| 27 |
|
| 28 |
Supported platforms:
|
|
|
|
| 312 |
|
| 313 |
For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggerganov/whisper.cpp/pull/566).
|
| 314 |
|
| 315 |
+
## OpenVINO support
|
| 316 |
+
|
| 317 |
+
On platforms that support [OpenVINO](https://github.com/openvinotoolkit/openvino), the Encoder inference can be executed
|
| 318 |
+
on OpenVINO-supported devices including x86 CPUs and Intel GPUs (integrated & discrete).
|
| 319 |
+
|
| 320 |
+
This can result in significant speedup in encoder performance. Here are the instructions for generating the OpenVINO model and using it with `whisper.cpp`:
|
| 321 |
+
|
| 322 |
+
- First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
|
| 323 |
+
|
| 324 |
+
Windows:
|
| 325 |
+
```
|
| 326 |
+
cd models
|
| 327 |
+
python -m venv openvino_conv_env
|
| 328 |
+
openvino_conv_env\Scripts\activate
|
| 329 |
+
python -m pip install --upgrade pip
|
| 330 |
+
pip install -r openvino-conversion-requirements.txt
|
| 331 |
+
```
|
| 332 |
+
|
| 333 |
+
Linux and macOS:
|
| 334 |
+
```
|
| 335 |
+
cd models
|
| 336 |
+
python3 -m venv openvino_conv_env
|
| 337 |
+
source openvino_conv_env/bin/activate
|
| 338 |
+
python -m pip install --upgrade pip
|
| 339 |
+
pip install -r openvino-conversion-requirements.txt
|
| 340 |
+
```
|
| 341 |
+
|
| 342 |
+
- Generate an OpenVINO encoder model. For example, to generate a `base.en` model, use:
|
| 343 |
+
|
| 344 |
+
```
|
| 345 |
+
python convert-whisper-to-openvino.py --model base.en
|
| 346 |
+
```
|
| 347 |
+
|
| 348 |
+
This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that
|
| 349 |
+
is the default location that the OpenVINO extension will search at runtime.
|
| 350 |
+
|
| 351 |
+
- Build `whisper.cpp` with OpenVINO support:
|
| 352 |
+
|
| 353 |
+
Download OpenVINO package from [release page](https://github.com/openvinotoolkit/openvino/releases). The recommended version to use is [2023.0.0](https://github.com/openvinotoolkit/openvino/releases/tag/2023.0.0).
|
| 354 |
+
|
| 355 |
+
After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
|
| 356 |
+
|
| 357 |
+
Linux:
|
| 358 |
+
```bash
|
| 359 |
+
source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
|
| 360 |
+
```
|
| 361 |
+
|
| 362 |
+
Windows (cmd):
|
| 363 |
+
```
|
| 364 |
+
C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
|
| 365 |
+
```
|
| 366 |
+
|
| 367 |
+
And then build the project using cmake:
|
| 368 |
+
```bash
|
| 369 |
+
cd build
|
| 370 |
+
cmake -DWHISPER_OPENVINO=1 ..
|
| 371 |
+
```
|
| 372 |
+
|
| 373 |
+
- Run the examples as usual. For example:
|
| 374 |
+
```bash
|
| 375 |
+
./main -m models/ggml-base.en.bin -f samples/jfk.wav
|
| 376 |
+
|
| 377 |
+
...
|
| 378 |
+
|
| 379 |
+
whisper_ctx_init_openvino_encoder: loading OpenVINO model from 'models/ggml-base.en-encoder-openvino.xml'
|
| 380 |
+
whisper_ctx_init_openvino_encoder: first run on a device may take a while ...
|
| 381 |
+
whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = GPU, cache_dir = models/ggml-base.en-encoder-openvino-cache
|
| 382 |
+
whisper_ctx_init_openvino_encoder: OpenVINO model loaded
|
| 383 |
+
|
| 384 |
+
system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 1 |
|
| 385 |
+
|
| 386 |
+
...
|
| 387 |
+
```
|
| 388 |
+
|
| 389 |
+
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
|
| 390 |
+
cached for the next run.
|
| 391 |
+
|
| 392 |
+
For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
|
| 393 |
+
|
| 394 |
## NVIDIA GPU support via cuBLAS
|
| 395 |
|
| 396 |
With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS.
|