mikey-rrr commited on
Commit
a2bec1d
·
unverified ·
1 Parent(s): 1cf1553

docs : make model options / model install methods clearer (#1806)

Browse files

* Make models more "discoverable"

* Clean up code block language identifiers

* make 3 options clearer

* undo Prettier formatter change

* docs: `$` shell prompt, consistently

* docs: minor changes

README.md CHANGED
@@ -36,7 +36,7 @@ Supported platforms:
36
  - [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
37
 
38
  The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp).
39
- The rest of the code is part of the [ggml](https://github.com/ggerganov/ggml) machine learning library.
40
 
41
  Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
42
  As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
@@ -61,22 +61,22 @@ Or you can even run it straight in the browser: [talk.wasm](examples/talk.wasm)
61
  - Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
62
  - Various other examples are available in the [examples](examples) folder
63
 
64
- The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD
65
- intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since
66
- the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
67
 
68
  ## Quick start
69
 
70
- First clone the repository.
71
 
72
- Then, download one of the Whisper models converted in [ggml format](models). For example:
 
 
 
 
73
 
74
  ```bash
75
  bash ./models/download-ggml-model.sh base.en
76
  ```
77
 
78
- If you wish to convert the Whisper models to ggml format yourself, instructions are in [models/README.md](models/README.md).
79
-
80
  Now build the [main](examples/main) example and transcribe an audio file like this:
81
 
82
  ```bash
@@ -91,7 +91,7 @@ make
91
 
92
  For a quick demo, simply run `make base.en`:
93
 
94
- ```java
95
  $ make base.en
96
 
97
  cc -I. -O3 -std=c11 -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
@@ -207,7 +207,7 @@ For detailed usage instructions, run: `./main -h`
207
  Note that the [main](examples/main) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
208
  For example, you can use `ffmpeg` like this:
209
 
210
- ```java
211
  ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
212
  ```
213
 
@@ -239,9 +239,9 @@ make large-v3
239
 
240
  ## Memory usage
241
 
242
- | Model | Disk | Mem |
243
- | --- | --- | --- |
244
- | tiny | 75 MiB | ~273 MB |
245
  | base | 142 MiB | ~388 MB |
246
  | small | 466 MiB | ~852 MB |
247
  | medium | 1.5 GiB | ~2.1 GB |
@@ -278,7 +278,7 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
278
 
279
  - To ensure `coremltools` operates correctly, please confirm that [Xcode](https://developer.apple.com/xcode/) is installed and execute `xcode-select --install` to install the command-line tools.
280
  - Python 3.10 is recommended.
281
- - [OPTIONAL] It is recommended to utilize a Python version management system, such as [Miniconda](https://docs.conda.io/en/latest/miniconda.html) for this step:
282
  - To create an environment, use: `conda create -n py310-whisper python=3.10 -y`
283
  - To activate the environment, use: `conda activate py310-whisper`
284
 
@@ -304,8 +304,8 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
304
 
305
  - Run the examples as usual. For example:
306
 
307
- ```bash
308
- ./main -m models/ggml-base.en.bin -f samples/jfk.wav
309
 
310
  ...
311
 
@@ -333,7 +333,8 @@ This can result in significant speedup in encoder performance. Here are the inst
333
  - First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
334
 
335
  Windows:
336
- ```
 
337
  cd models
338
  python -m venv openvino_conv_env
339
  openvino_conv_env\Scripts\activate
@@ -342,7 +343,8 @@ This can result in significant speedup in encoder performance. Here are the inst
342
  ```
343
 
344
  Linux and macOS:
345
- ```
 
346
  cd models
347
  python3 -m venv openvino_conv_env
348
  source openvino_conv_env/bin/activate
@@ -356,7 +358,7 @@ This can result in significant speedup in encoder performance. Here are the inst
356
  python convert-whisper-to-openvino.py --model base.en
357
  ```
358
 
359
- This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that
360
  is the default location that the OpenVINO extension will search at runtime.
361
 
362
  - Build `whisper.cpp` with OpenVINO support:
@@ -366,24 +368,28 @@ This can result in significant speedup in encoder performance. Here are the inst
366
  After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
367
 
368
  Linux:
 
369
  ```bash
370
  source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
371
  ```
372
 
373
  Windows (cmd):
374
- ```
 
375
  C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
376
  ```
377
 
378
  And then build the project using cmake:
 
379
  ```bash
380
  cmake -B build -DWHISPER_OPENVINO=1
381
  cmake --build build -j --config Release
382
  ```
383
 
384
  - Run the examples as usual. For example:
385
- ```bash
386
- ./main -m models/ggml-base.en.bin -f samples/jfk.wav
 
387
 
388
  ...
389
 
@@ -434,7 +440,6 @@ cmake -B build -DWHISPER_CLBLAST=ON
434
  cmake --build build -j --config Release
435
  ```
436
 
437
-
438
  Run all the examples as usual.
439
 
440
  ## BLAS CPU support via OpenBLAS
@@ -452,10 +457,12 @@ WHISPER_OPENBLAS=1 make -j
452
  ## Docker
453
 
454
  ### Prerequisites
455
- * Docker must be installed and running on your system.
456
- * Create a folder to store big models & intermediate files (ex. /whisper/models)
 
457
 
458
  ### Images
 
459
  We have two Docker images available for this project:
460
 
461
  1. `ghcr.io/ggerganov/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
@@ -491,7 +498,7 @@ in about half a minute on a MacBook M1 Pro, using `medium.en` model:
491
  <details>
492
  <summary>Expand to see the result</summary>
493
 
494
- ```java
495
  $ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
496
 
497
  whisper_init_from_file: loading model from 'models/ggml-medium.en.bin'
@@ -563,6 +570,7 @@ whisper_print_timings: encode time = 18665.10 ms / 9 runs ( 2073.90 ms per
563
  whisper_print_timings: decode time = 13090.93 ms / 549 runs ( 23.85 ms per run)
564
  whisper_print_timings: total time = 32733.52 ms
565
  ```
 
566
  </details>
567
 
568
  ## Real-time audio input example
@@ -571,7 +579,7 @@ This is a naive example of performing real-time inference on audio from your mic
571
  The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
572
  More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
573
 
574
- ```java
575
  make stream
576
  ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
577
  ```
@@ -583,7 +591,7 @@ https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a
583
  Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy
584
  to highlight words with high or low confidence:
585
 
586
- ```java
587
  ./main -m models/ggml-base.en.bin -f samples/gb0.wav --print-colors
588
  ```
589
 
@@ -593,8 +601,8 @@ to highlight words with high or low confidence:
593
 
594
  For example, to limit the line length to a maximum of 16 characters, simply add `-ml 16`:
595
 
596
- ```java
597
- ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16
598
 
599
  whisper_model_load: loading model from './models/ggml-base.en.bin'
600
  ...
@@ -617,8 +625,8 @@ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 pr
617
 
618
  The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
619
 
620
- ```java
621
- ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1
622
 
623
  whisper_model_load: loading model from './models/ggml-base.en.bin'
624
  ...
@@ -688,7 +696,7 @@ This requires to have `ffmpeg` installed.
688
 
689
  Here are a few *"typical"* examples:
690
 
691
- ```java
692
  ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -owts
693
  source ./samples/jfk.wav.wts
694
  ffplay ./samples/jfk.wav.mp4
@@ -698,7 +706,7 @@ https://user-images.githubusercontent.com/1991296/199337465-dbee4b5e-9aeb-48a3-b
698
 
699
  ---
700
 
701
- ```java
702
  ./main -m ./models/ggml-base.en.bin -f ./samples/mm0.wav -owts
703
  source ./samples/mm0.wav.wts
704
  ffplay ./samples/mm0.wav.mp4
@@ -708,7 +716,7 @@ https://user-images.githubusercontent.com/1991296/199337504-cc8fd233-0cb7-4920-9
708
 
709
  ---
710
 
711
- ```java
712
  ./main -m ./models/ggml-base.en.bin -f ./samples/gb0.wav -owts
713
  source ./samples/gb0.wav.wts
714
  ffplay ./samples/gb0.wav.mp4
@@ -722,7 +730,7 @@ https://user-images.githubusercontent.com/1991296/199337538-b7b0c7a3-2753-4a88-a
722
 
723
  Use the [extra/bench-wts.sh](https://github.com/ggerganov/whisper.cpp/blob/master/extra/bench-wts.sh) script to generate a video in the following format:
724
 
725
- ```java
726
  ./extra/bench-wts.sh samples/jfk.wav
727
  ffplay ./samples/jfk.wav.all.mp4
728
  ```
@@ -751,8 +759,7 @@ It is written in python with the intention of being easy to modify and extend fo
751
 
752
  It outputs a csv file with the results of the benchmarking.
753
 
754
-
755
- ## ggml format
756
 
757
  The original models are converted to a custom binary format. This allows to pack everything needed into a single file:
758
 
@@ -767,51 +774,50 @@ or manually from here:
767
  - https://huggingface.co/ggerganov/whisper.cpp
768
  - https://ggml.ggerganov.com
769
 
770
- For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or the README
771
- in [models](models).
772
 
773
  ## [Bindings](https://github.com/ggerganov/whisper.cpp/discussions/categories/bindings)
774
 
775
- - [X] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggerganov/whisper.cpp/discussions/310)
776
- - [X] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggerganov/whisper.cpp/discussions/309)
777
  - React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
778
- - [X] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggerganov/whisper.cpp/discussions/312)
779
- - [X] Java:
780
  - [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
781
- - [X] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggerganov/whisper.cpp/discussions/507)
782
- - [X] Objective-C / Swift: [ggerganov/whisper.spm](https://github.com/ggerganov/whisper.spm) | [#313](https://github.com/ggerganov/whisper.cpp/discussions/313)
783
  - [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
784
- - [X] .NET: | [#422](https://github.com/ggerganov/whisper.cpp/discussions/422)
785
  - [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
786
  - [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
787
- - [X] Python: | [#9](https://github.com/ggerganov/whisper.cpp/issues/9)
788
  - [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
789
  - [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
790
- - [X] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
791
- - [X] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
792
 
793
  ## Examples
794
 
795
  There are various examples of using the library for different projects in the [examples](examples) folder.
796
  Some of the examples are even ported to run in the browser using WebAssembly. Check them out!
797
 
798
- | Example | Web | Description |
799
- | --- | --- | --- |
800
- | [main](examples/main) | [whisper.wasm](examples/whisper.wasm) | Tool for translating and transcribing audio using Whisper |
801
- | [bench](examples/bench) | [bench.wasm](examples/bench.wasm) | Benchmark the performance of Whisper on your machine |
802
- | [stream](examples/stream) | [stream.wasm](examples/stream.wasm) | Real-time transcription of raw microphone capture |
803
- | [command](examples/command) | [command.wasm](examples/command.wasm) | Basic voice assistant example for receiving voice commands from the mic |
804
- | [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
805
- | [talk](examples/talk) | [talk.wasm](examples/talk.wasm) | Talk with a GPT-2 bot |
806
- | [talk-llama](examples/talk-llama) | | Talk with a LLaMA bot |
807
- | [whisper.objc](examples/whisper.objc) | | iOS mobile application using whisper.cpp |
808
- | [whisper.swiftui](examples/whisper.swiftui) | | SwiftUI iOS / macOS application using whisper.cpp |
809
- | [whisper.android](examples/whisper.android) | | Android mobile application using whisper.cpp |
810
- | [whisper.nvim](examples/whisper.nvim) | | Speech-to-text plugin for Neovim |
811
- | [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
812
- | [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggerganov/whisper.cpp/issues/185) |
813
- | [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
814
- | [server](examples/server) | | HTTP transcription server with OAI-like API |
815
 
816
  ## [Discussions](https://github.com/ggerganov/whisper.cpp/discussions)
817
 
 
36
  - [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
37
 
38
  The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp).
39
+ The rest of the code is part of the [`ggml`](https://github.com/ggerganov/ggml) machine learning library.
40
 
41
  Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
42
  As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
 
61
  - Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
62
  - Various other examples are available in the [examples](examples) folder
63
 
64
+ The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
 
 
65
 
66
  ## Quick start
67
 
68
+ First clone the repository:
69
 
70
+ ```bash
71
+ git clone https://github.com/ggerganov/whisper.cpp.git
72
+ ```
73
+
74
+ Then, download one of the Whisper [models](models/README.md) converted in [`ggml` format](#ggml-format). For example:
75
 
76
  ```bash
77
  bash ./models/download-ggml-model.sh base.en
78
  ```
79
 
 
 
80
  Now build the [main](examples/main) example and transcribe an audio file like this:
81
 
82
  ```bash
 
91
 
92
  For a quick demo, simply run `make base.en`:
93
 
94
+ ```text
95
  $ make base.en
96
 
97
  cc -I. -O3 -std=c11 -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
 
207
  Note that the [main](examples/main) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
208
  For example, you can use `ffmpeg` like this:
209
 
210
+ ```bash
211
  ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
212
  ```
213
 
 
239
 
240
  ## Memory usage
241
 
242
+ | Model | Disk | Mem |
243
+ | ------ | ------- | ------- |
244
+ | tiny | 75 MiB | ~273 MB |
245
  | base | 142 MiB | ~388 MB |
246
  | small | 466 MiB | ~852 MB |
247
  | medium | 1.5 GiB | ~2.1 GB |
 
278
 
279
  - To ensure `coremltools` operates correctly, please confirm that [Xcode](https://developer.apple.com/xcode/) is installed and execute `xcode-select --install` to install the command-line tools.
280
  - Python 3.10 is recommended.
281
+ - [OPTIONAL] It is recommended to utilize a Python version management system, such as [Miniconda](https://docs.conda.io/en/latest/miniconda.html) for this step:
282
  - To create an environment, use: `conda create -n py310-whisper python=3.10 -y`
283
  - To activate the environment, use: `conda activate py310-whisper`
284
 
 
304
 
305
  - Run the examples as usual. For example:
306
 
307
+ ```text
308
+ $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
309
 
310
  ...
311
 
 
333
  - First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
334
 
335
  Windows:
336
+
337
+ ```powershell
338
  cd models
339
  python -m venv openvino_conv_env
340
  openvino_conv_env\Scripts\activate
 
343
  ```
344
 
345
  Linux and macOS:
346
+
347
+ ```bash
348
  cd models
349
  python3 -m venv openvino_conv_env
350
  source openvino_conv_env/bin/activate
 
358
  python convert-whisper-to-openvino.py --model base.en
359
  ```
360
 
361
+ This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as `ggml` models, as that
362
  is the default location that the OpenVINO extension will search at runtime.
363
 
364
  - Build `whisper.cpp` with OpenVINO support:
 
368
  After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
369
 
370
  Linux:
371
+
372
  ```bash
373
  source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
374
  ```
375
 
376
  Windows (cmd):
377
+
378
+ ```powershell
379
  C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
380
  ```
381
 
382
  And then build the project using cmake:
383
+
384
  ```bash
385
  cmake -B build -DWHISPER_OPENVINO=1
386
  cmake --build build -j --config Release
387
  ```
388
 
389
  - Run the examples as usual. For example:
390
+
391
+ ```text
392
+ $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
393
 
394
  ...
395
 
 
440
  cmake --build build -j --config Release
441
  ```
442
 
 
443
  Run all the examples as usual.
444
 
445
  ## BLAS CPU support via OpenBLAS
 
457
  ## Docker
458
 
459
  ### Prerequisites
460
+
461
+ - Docker must be installed and running on your system.
462
+ - Create a folder to store big models & intermediate files (ex. /whisper/models)
463
 
464
  ### Images
465
+
466
  We have two Docker images available for this project:
467
 
468
  1. `ghcr.io/ggerganov/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
 
498
  <details>
499
  <summary>Expand to see the result</summary>
500
 
501
+ ```text
502
  $ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
503
 
504
  whisper_init_from_file: loading model from 'models/ggml-medium.en.bin'
 
570
  whisper_print_timings: decode time = 13090.93 ms / 549 runs ( 23.85 ms per run)
571
  whisper_print_timings: total time = 32733.52 ms
572
  ```
573
+
574
  </details>
575
 
576
  ## Real-time audio input example
 
579
  The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
580
  More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
581
 
582
+ ```bash
583
  make stream
584
  ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
585
  ```
 
591
  Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy
592
  to highlight words with high or low confidence:
593
 
594
+ ```bash
595
  ./main -m models/ggml-base.en.bin -f samples/gb0.wav --print-colors
596
  ```
597
 
 
601
 
602
  For example, to limit the line length to a maximum of 16 characters, simply add `-ml 16`:
603
 
604
+ ```text
605
+ $ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16
606
 
607
  whisper_model_load: loading model from './models/ggml-base.en.bin'
608
  ...
 
625
 
626
  The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
627
 
628
+ ```text
629
+ $ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1
630
 
631
  whisper_model_load: loading model from './models/ggml-base.en.bin'
632
  ...
 
696
 
697
  Here are a few *"typical"* examples:
698
 
699
+ ```bash
700
  ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -owts
701
  source ./samples/jfk.wav.wts
702
  ffplay ./samples/jfk.wav.mp4
 
706
 
707
  ---
708
 
709
+ ```bash
710
  ./main -m ./models/ggml-base.en.bin -f ./samples/mm0.wav -owts
711
  source ./samples/mm0.wav.wts
712
  ffplay ./samples/mm0.wav.mp4
 
716
 
717
  ---
718
 
719
+ ```bash
720
  ./main -m ./models/ggml-base.en.bin -f ./samples/gb0.wav -owts
721
  source ./samples/gb0.wav.wts
722
  ffplay ./samples/gb0.wav.mp4
 
730
 
731
  Use the [extra/bench-wts.sh](https://github.com/ggerganov/whisper.cpp/blob/master/extra/bench-wts.sh) script to generate a video in the following format:
732
 
733
+ ```bash
734
  ./extra/bench-wts.sh samples/jfk.wav
735
  ffplay ./samples/jfk.wav.all.mp4
736
  ```
 
759
 
760
  It outputs a csv file with the results of the benchmarking.
761
 
762
+ ## `ggml` format
 
763
 
764
  The original models are converted to a custom binary format. This allows to pack everything needed into a single file:
765
 
 
774
  - https://huggingface.co/ggerganov/whisper.cpp
775
  - https://ggml.ggerganov.com
776
 
777
+ For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or [models/README.md](models/README.md).
 
778
 
779
  ## [Bindings](https://github.com/ggerganov/whisper.cpp/discussions/categories/bindings)
780
 
781
+ - [x] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggerganov/whisper.cpp/discussions/310)
782
+ - [x] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggerganov/whisper.cpp/discussions/309)
783
  - React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
784
+ - [x] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggerganov/whisper.cpp/discussions/312)
785
+ - [x] Java:
786
  - [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
787
+ - [x] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggerganov/whisper.cpp/discussions/507)
788
+ - [x] Objective-C / Swift: [ggerganov/whisper.spm](https://github.com/ggerganov/whisper.spm) | [#313](https://github.com/ggerganov/whisper.cpp/discussions/313)
789
  - [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
790
+ - [x] .NET: | [#422](https://github.com/ggerganov/whisper.cpp/discussions/422)
791
  - [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
792
  - [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
793
+ - [x] Python: | [#9](https://github.com/ggerganov/whisper.cpp/issues/9)
794
  - [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
795
  - [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
796
+ - [x] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
797
+ - [x] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
798
 
799
  ## Examples
800
 
801
  There are various examples of using the library for different projects in the [examples](examples) folder.
802
  Some of the examples are even ported to run in the browser using WebAssembly. Check them out!
803
 
804
+ | Example | Web | Description |
805
+ | --------------------------------------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
806
+ | [main](examples/main) | [whisper.wasm](examples/whisper.wasm) | Tool for translating and transcribing audio using Whisper |
807
+ | [bench](examples/bench) | [bench.wasm](examples/bench.wasm) | Benchmark the performance of Whisper on your machine |
808
+ | [stream](examples/stream) | [stream.wasm](examples/stream.wasm) | Real-time transcription of raw microphone capture |
809
+ | [command](examples/command) | [command.wasm](examples/command.wasm) | Basic voice assistant example for receiving voice commands from the mic |
810
+ | [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
811
+ | [talk](examples/talk) | [talk.wasm](examples/talk.wasm) | Talk with a GPT-2 bot |
812
+ | [talk-llama](examples/talk-llama) | | Talk with a LLaMA bot |
813
+ | [whisper.objc](examples/whisper.objc) | | iOS mobile application using whisper.cpp |
814
+ | [whisper.swiftui](examples/whisper.swiftui) | | SwiftUI iOS / macOS application using whisper.cpp |
815
+ | [whisper.android](examples/whisper.android) | | Android mobile application using whisper.cpp |
816
+ | [whisper.nvim](examples/whisper.nvim) | | Speech-to-text plugin for Neovim |
817
+ | [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
818
+ | [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggerganov/whisper.cpp/issues/185) |
819
+ | [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
820
+ | [server](examples/server) | | HTTP transcription server with OAI-like API |
821
 
822
  ## [Discussions](https://github.com/ggerganov/whisper.cpp/discussions)
823
 
bindings/javascript/README.md CHANGED
@@ -41,7 +41,7 @@ make publish-npm
41
 
42
  ## Sample run
43
 
44
- ```java
45
  $ node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js
46
 
47
  whisper_model_load: loading model from 'whisper.bin'
@@ -63,7 +63,7 @@ whisper_model_load: ggml ctx size = 140.60 MB
63
  whisper_model_load: memory size = 22.83 MB
64
  whisper_model_load: model size = 140.54 MB
65
 
66
- system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | NEON = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 1 | BLAS = 0 |
67
 
68
  operator(): processing 176000 samples, 11.0 sec, 8 threads, 1 processors, lang = en, task = transcribe ...
69
 
 
41
 
42
  ## Sample run
43
 
44
+ ```text
45
  $ node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js
46
 
47
  whisper_model_load: loading model from 'whisper.bin'
 
63
  whisper_model_load: memory size = 22.83 MB
64
  whisper_model_load: model size = 140.54 MB
65
 
66
+ system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | NEON = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 1 | BLAS = 0 |
67
 
68
  operator(): processing 176000 samples, 11.0 sec, 8 threads, 1 processors, lang = en, task = transcribe ...
69
 
examples/stream/README.md CHANGED
@@ -4,7 +4,7 @@ This is a naive example of performing real-time inference on audio from your mic
4
  The `stream` tool samples the audio every half a second and runs the transcription continously.
5
  More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
6
 
7
- ```java
8
  ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
9
  ```
10
 
@@ -14,7 +14,7 @@ https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a
14
 
15
  Setting the `--step` argument to `0` enables the sliding window mode:
16
 
17
- ```java
18
  ./stream -m ./models/ggml-small.en.bin -t 6 --step 0 --length 30000 -vth 0.6
19
  ```
20
 
@@ -39,8 +39,8 @@ brew install sdl2
39
  make stream
40
  ```
41
 
42
- Ensure you are at the root of the repo when running `make stream`. Not within the `examples/stream` dir
43
- as the libraries needed like `common-sdl.h` are located within `examples`. Attempting to compile within
44
  `examples/steam` means your compiler cannot find them and it gives an error it cannot find the file.
45
 
46
  ```bash
 
4
  The `stream` tool samples the audio every half a second and runs the transcription continously.
5
  More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
6
 
7
+ ```bash
8
  ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
9
  ```
10
 
 
14
 
15
  Setting the `--step` argument to `0` enables the sliding window mode:
16
 
17
+ ```bash
18
  ./stream -m ./models/ggml-small.en.bin -t 6 --step 0 --length 30000 -vth 0.6
19
  ```
20
 
 
39
  make stream
40
  ```
41
 
42
+ Ensure you are at the root of the repo when running `make stream`. Not within the `examples/stream` dir
43
+ as the libraries needed like `common-sdl.h` are located within `examples`. Attempting to compile within
44
  `examples/steam` means your compiler cannot find them and it gives an error it cannot find the file.
45
 
46
  ```bash
examples/whisper.objc/README.md CHANGED
@@ -11,11 +11,11 @@ https://user-images.githubusercontent.com/1991296/204126266-ce4177c6-6eca-4bd9-b
11
 
12
  ## Usage
13
 
14
- ```java
15
  git clone https://github.com/ggerganov/whisper.cpp
16
  open whisper.cpp/examples/whisper.objc/whisper.objc.xcodeproj/
17
 
18
- // If you don't want to convert a Core ML model, you can skip this step by create dummy model
19
  mkdir models/ggml-base.en-encoder.mlmodelc
20
  ```
21
 
 
11
 
12
  ## Usage
13
 
14
+ ```bash
15
  git clone https://github.com/ggerganov/whisper.cpp
16
  open whisper.cpp/examples/whisper.objc/whisper.objc.xcodeproj/
17
 
18
+ # if you don't want to convert a Core ML model, you can skip this step by create dummy model
19
  mkdir models/ggml-base.en-encoder.mlmodelc
20
  ```
21
 
models/README.md CHANGED
@@ -1,19 +1,16 @@
1
- ## Whisper model files in custom ggml format
2
 
3
- The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L27)
4
  are converted to custom `ggml` format in order to be able to load them in C/C++.
5
  Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
6
 
7
- You can either obtain the original models and generate the `ggml` files yourself using the conversion script,
8
- or you can use the [download-ggml-model.sh](download-ggml-model.sh) script to download the already converted models.
9
- Currently, they are hosted on the following locations:
10
 
11
- - https://huggingface.co/ggerganov/whisper.cpp
12
- - https://ggml.ggerganov.com
13
 
14
- Sample download:
15
 
16
- ```java
17
  $ ./download-ggml-model.sh base.en
18
  Downloading ggml model base.en ...
19
  models/ggml-base.en.bin 100%[=============================================>] 141.11M 5.41MB/s in 22s
@@ -23,35 +20,46 @@ You can now use it like this:
23
  $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
24
  ```
25
 
26
- To convert the files yourself, use the convert-pt-to-ggml.py script. Here is an example usage.
27
- The original PyTorch files are assumed to have been downloaded into ~/.cache/whisper
28
- Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
29
- ```
 
 
 
 
 
 
 
 
 
 
30
  mkdir models/whisper-medium
31
  python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
32
  mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
33
  rmdir models/whisper-medium
34
  ```
35
 
36
- A third option to obtain the model files is to download them from Hugging Face:
37
-
38
- https://huggingface.co/ggerganov/whisper.cpp/tree/main
39
-
40
  ## Available models
41
 
42
- | Model | Disk | SHA |
43
- | --- | --- | --- |
44
- | tiny | 75 MiB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
45
- | tiny.en | 75 MiB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
46
- | base | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
47
- | base.en | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` |
48
- | small | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
49
- | small.en | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
50
- | medium | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
51
- | medium.en | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
52
- | large-v1 | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
53
- | large-v2 | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
54
- | large-v3 | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |
 
 
 
 
 
55
 
56
  ## Model files for testing purposes
57
 
 
1
+ ## Whisper model files in custom `ggml` format
2
 
3
+ The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L30)
4
  are converted to custom `ggml` format in order to be able to load them in C/C++.
5
  Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
6
 
7
+ There are three ways to obtain `ggml` models:
 
 
8
 
9
+ ### 1. Use [download-ggml-model.sh](download-ggml-model.sh) to download pre-converted models
 
10
 
11
+ Example download:
12
 
13
+ ```text
14
  $ ./download-ggml-model.sh base.en
15
  Downloading ggml model base.en ...
16
  models/ggml-base.en.bin 100%[=============================================>] 141.11M 5.41MB/s in 22s
 
20
  $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
21
  ```
22
 
23
+ ### 2. Manually download pre-converted models
24
+
25
+ `ggml` models are available from the following locations:
26
+
27
+ - https://huggingface.co/ggerganov/whisper.cpp/tree/main
28
+ - https://ggml.ggerganov.com
29
+
30
+ ### 3. Convert with [convert-pt-to-ggml.py](convert-pt-to-ggml.py)
31
+
32
+ Download one of the [models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L30) and generate the `ggml` files using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
33
+
34
+ Example conversion, assuming the original PyTorch files have been downloaded into `~/.cache/whisper`. Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
35
+
36
+ ```bash
37
  mkdir models/whisper-medium
38
  python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
39
  mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
40
  rmdir models/whisper-medium
41
  ```
42
 
 
 
 
 
43
  ## Available models
44
 
45
+ | Model | Disk | SHA |
46
+ | ------------- | ------- | ------------------------------------------ |
47
+ | tiny | 75 MiB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
48
+ | tiny.en | 75 MiB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
49
+ | base | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
50
+ | base.en | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` |
51
+ | small | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
52
+ | small.en | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
53
+ | small.en-tdrz | 465 MiB | `b6c6e7e89af1a35c08e6de56b66ca6a02a2fdfa1` |
54
+ | medium | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
55
+ | medium.en | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
56
+ | large-v1 | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
57
+ | large-v2 | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
58
+ | large-v2-q5_0 | 1.1 GiB | `00e39f2196344e901b3a2bd5814807a769bd1630` |
59
+ | large-v3 | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |
60
+ | large-v3-q5_0 | 1.1 GiB | `e6e2ed78495d403bef4b7cff42ef4aaadcfea8de` |
61
+
62
+ Models are multilingual unless the model name includes `.en`. Models ending in `-q5_0` are [quantized](../README.md#quantization). Models ending in `-tdrz` support local diarization (marking of speaker turns) using [tinydiarize](https://github.com/akashmjn/tinydiarize). More information about models is available [upstream (openai/whisper)](https://github.com/openai/whisper#available-models-and-languages). The list above is a subset of the models supported by the [download-ggml-model.sh](download-ggml-model.sh) script, but many more are available at https://huggingface.co/ggerganov/whisper.cpp/tree/main and elsewhere.
63
 
64
  ## Model files for testing purposes
65
 
models/download-ggml-model.sh CHANGED
@@ -9,6 +9,9 @@
9
  src="https://huggingface.co/ggerganov/whisper.cpp"
10
  pfx="resolve/main/ggml"
11
 
 
 
 
12
  # get the path of this script
13
  get_script_path() {
14
  if [ -x "$(command -v realpath)" ]; then
@@ -22,17 +25,17 @@ get_script_path() {
22
  models_path="${2:-$(get_script_path)}"
23
 
24
  # Whisper models
25
- models="tiny.en
26
- tiny
27
  tiny-q5_1
28
  tiny.en-q5_1
29
- base.en
30
  base
 
31
  base-q5_1
32
  base.en-q5_1
 
33
  small.en
34
  small.en-tdrz
35
- small
36
  small-q5_1
37
  small.en-q5_1
38
  medium
@@ -41,14 +44,21 @@ medium-q5_0
41
  medium.en-q5_0
42
  large-v1
43
  large-v2
 
44
  large-v3
45
  large-v3-q5_0"
46
 
47
  # list available models
48
  list_models() {
49
  printf "\n"
50
- printf " Available models:"
 
51
  for model in $models; do
 
 
 
 
 
52
  printf " %s" "$model"
53
  done
54
  printf "\n\n"
@@ -57,6 +67,8 @@ list_models() {
57
  if [ "$#" -lt 1 ] || [ "$#" -gt 2 ]; then
58
  printf "Usage: %s <model> [models_path]\n" "$0"
59
  list_models
 
 
60
 
61
  exit 1
62
  fi
@@ -98,14 +110,12 @@ else
98
  exit 1
99
  fi
100
 
101
-
102
  if [ $? -ne 0 ]; then
103
  printf "Failed to download ggml model %s \n" "$model"
104
  printf "Please try again later or download the original Whisper model files and convert them yourself.\n"
105
  exit 1
106
  fi
107
 
108
-
109
  printf "Done! Model '%s' saved in '%s/ggml-%s.bin'\n" "$model" "$models_path" "$model"
110
  printf "You can now use it like this:\n\n"
111
  printf " $ ./main -m %s/ggml-%s.bin -f samples/jfk.wav\n" "$models_path" "$model"
 
9
  src="https://huggingface.co/ggerganov/whisper.cpp"
10
  pfx="resolve/main/ggml"
11
 
12
+ BOLD="\033[1m"
13
+ RESET='\033[0m'
14
+
15
  # get the path of this script
16
  get_script_path() {
17
  if [ -x "$(command -v realpath)" ]; then
 
25
  models_path="${2:-$(get_script_path)}"
26
 
27
  # Whisper models
28
+ models="tiny
29
+ tiny.en
30
  tiny-q5_1
31
  tiny.en-q5_1
 
32
  base
33
+ base.en
34
  base-q5_1
35
  base.en-q5_1
36
+ small
37
  small.en
38
  small.en-tdrz
 
39
  small-q5_1
40
  small.en-q5_1
41
  medium
 
44
  medium.en-q5_0
45
  large-v1
46
  large-v2
47
+ large-v2-q5_0
48
  large-v3
49
  large-v3-q5_0"
50
 
51
  # list available models
52
  list_models() {
53
  printf "\n"
54
+ printf "Available models:"
55
+ model_class=""
56
  for model in $models; do
57
+ this_model_class="${model%%[.-]*}"
58
+ if [ "$this_model_class" != "$model_class" ]; then
59
+ printf "\n "
60
+ model_class=$this_model_class
61
+ fi
62
  printf " %s" "$model"
63
  done
64
  printf "\n\n"
 
67
  if [ "$#" -lt 1 ] || [ "$#" -gt 2 ]; then
68
  printf "Usage: %s <model> [models_path]\n" "$0"
69
  list_models
70
+ printf "___________________________________________________________\n"
71
+ printf "${BOLD}.en${RESET} = english-only ${BOLD}-q5_[01]${RESET} = quantized ${BOLD}-tdrz${RESET} = tinydiarize\n"
72
 
73
  exit 1
74
  fi
 
110
  exit 1
111
  fi
112
 
 
113
  if [ $? -ne 0 ]; then
114
  printf "Failed to download ggml model %s \n" "$model"
115
  printf "Please try again later or download the original Whisper model files and convert them yourself.\n"
116
  exit 1
117
  fi
118
 
 
119
  printf "Done! Model '%s' saved in '%s/ggml-%s.bin'\n" "$model" "$models_path" "$model"
120
  printf "You can now use it like this:\n\n"
121
  printf " $ ./main -m %s/ggml-%s.bin -f samples/jfk.wav\n" "$models_path" "$model"