Ryan Metcalfe commited on
Commit
481cf7a
·
unverified ·
1 Parent(s): 13afdc6

readme : add OpenVINO support details (#1112)

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md CHANGED
@@ -22,6 +22,7 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
22
  - [Partial GPU support for NVIDIA via cuBLAS](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
23
  - [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
24
  - [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas)
 
25
  - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
26
 
27
  Supported platforms:
@@ -311,6 +312,85 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
311
 
312
  For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggerganov/whisper.cpp/pull/566).
313
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
314
  ## NVIDIA GPU support via cuBLAS
315
 
316
  With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS.
 
22
  - [Partial GPU support for NVIDIA via cuBLAS](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
23
  - [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
24
  - [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas)
25
+ - [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
26
  - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
27
 
28
  Supported platforms:
 
312
 
313
  For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggerganov/whisper.cpp/pull/566).
314
 
315
+ ## OpenVINO support
316
+
317
+ On platforms that support [OpenVINO](https://github.com/openvinotoolkit/openvino), the Encoder inference can be executed
318
+ on OpenVINO-supported devices including x86 CPUs and Intel GPUs (integrated & discrete).
319
+
320
+ This can result in significant speedup in encoder performance. Here are the instructions for generating the OpenVINO model and using it with `whisper.cpp`:
321
+
322
+ - First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
323
+
324
+ Windows:
325
+ ```
326
+ cd models
327
+ python -m venv openvino_conv_env
328
+ openvino_conv_env\Scripts\activate
329
+ python -m pip install --upgrade pip
330
+ pip install -r openvino-conversion-requirements.txt
331
+ ```
332
+
333
+ Linux and macOS:
334
+ ```
335
+ cd models
336
+ python3 -m venv openvino_conv_env
337
+ source openvino_conv_env/bin/activate
338
+ python -m pip install --upgrade pip
339
+ pip install -r openvino-conversion-requirements.txt
340
+ ```
341
+
342
+ - Generate an OpenVINO encoder model. For example, to generate a `base.en` model, use:
343
+
344
+ ```
345
+ python convert-whisper-to-openvino.py --model base.en
346
+ ```
347
+
348
+ This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that
349
+ is the default location that the OpenVINO extension will search at runtime.
350
+
351
+ - Build `whisper.cpp` with OpenVINO support:
352
+
353
+ Download OpenVINO package from [release page](https://github.com/openvinotoolkit/openvino/releases). The recommended version to use is [2023.0.0](https://github.com/openvinotoolkit/openvino/releases/tag/2023.0.0).
354
+
355
+ After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
356
+
357
+ Linux:
358
+ ```bash
359
+ source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
360
+ ```
361
+
362
+ Windows (cmd):
363
+ ```
364
+ C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
365
+ ```
366
+
367
+ And then build the project using cmake:
368
+ ```bash
369
+ cd build
370
+ cmake -DWHISPER_OPENVINO=1 ..
371
+ ```
372
+
373
+ - Run the examples as usual. For example:
374
+ ```bash
375
+ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
376
+
377
+ ...
378
+
379
+ whisper_ctx_init_openvino_encoder: loading OpenVINO model from 'models/ggml-base.en-encoder-openvino.xml'
380
+ whisper_ctx_init_openvino_encoder: first run on a device may take a while ...
381
+ whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = GPU, cache_dir = models/ggml-base.en-encoder-openvino-cache
382
+ whisper_ctx_init_openvino_encoder: OpenVINO model loaded
383
+
384
+ system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 1 |
385
+
386
+ ...
387
+ ```
388
+
389
+ The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
390
+ cached for the next run.
391
+
392
+ For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
393
+
394
  ## NVIDIA GPU support via cuBLAS
395
 
396
  With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS.