Spaces:

natasa365
/

whisper.cpp

Running

App Files Files Community

mikey-rrr commited on Jan 26, 2024

Commit

a2bec1d

unverified ·

1 Parent(s): 1cf1553

docs : make model options / model install methods clearer (#1806)

Browse files

* Make models more "discoverable"

* Clean up code block language identifiers

* make 3 options clearer

* undo Prettier formatter change

* docs: `$` shell prompt, consistently

* docs: minor changes

Files changed (6) hide show

README.md +73 -67
bindings/javascript/README.md +2 -2
examples/stream/README.md +4 -4
examples/whisper.objc/README.md +2 -2
models/README.md +38 -30
models/download-ggml-model.sh +17 -7

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ Supported platforms:
 - [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
 The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp).
-The rest of the code is part of the [ggml](https://github.com/ggerganov/ggml) machine learning library.
 Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
 As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
@@ -61,22 +61,22 @@ Or you can even run it straight in the browser: [talk.wasm](examples/talk.wasm)
 - Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
 - Various other examples are available in the [examples](examples) folder
-The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD
-intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since
-the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
 ## Quick start
-First clone the repository.
-Then, download one of the Whisper models converted in [ggml format](models). For example:
 ```bash
 bash ./models/download-ggml-model.sh base.en
 ```
-If you wish to convert the Whisper models to ggml format yourself, instructions are in [models/README.md](models/README.md).
 Now build the [main](examples/main) example and transcribe an audio file like this:
 ```bash
@@ -91,7 +91,7 @@ make
 For a quick demo, simply run `make base.en`:
-```java
 $ make base.en
 cc  -I.              -O3 -std=c11   -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
@@ -207,7 +207,7 @@ For detailed usage instructions, run: `./main -h`
 Note that the [main](examples/main) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
 For example, you can use `ffmpeg` like this:
-```java
 ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
 ```
@@ -239,9 +239,9 @@ make large-v3
 ## Memory usage
-| Model  | Disk    | Mem      |
-| ---    | ---     | ---      |
-| tiny   |  75 MiB | ~273 MB |
 | base   | 142 MiB | ~388 MB |
 | small  | 466 MiB | ~852 MB |
 | medium | 1.5 GiB | ~2.1 GB |
@@ -278,7 +278,7 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
   - To ensure `coremltools` operates correctly, please confirm that [Xcode](https://developer.apple.com/xcode/) is installed and execute `xcode-select --install` to install the command-line tools.
   - Python 3.10 is recommended.
-  - [OPTIONAL] It is recommended to utilize a Python version management system, such as [Miniconda](https://docs.conda.io/en/latest/miniconda.html)  for this step:
     - To create an environment, use: `conda create -n py310-whisper python=3.10 -y`
     - To activate the environment, use: `conda activate py310-whisper`
@@ -304,8 +304,8 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
 - Run the examples as usual. For example:
-  ```bash
-  ./main -m models/ggml-base.en.bin -f samples/jfk.wav
   ...
@@ -333,7 +333,8 @@ This can result in significant speedup in encoder performance. Here are the inst
 - First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
   Windows:
-  ```
   cd models
   python -m venv openvino_conv_env
   openvino_conv_env\Scripts\activate
@@ -342,7 +343,8 @@ This can result in significant speedup in encoder performance. Here are the inst
   ```
   Linux and macOS:
-  ```
   cd models
   python3 -m venv openvino_conv_env
   source openvino_conv_env/bin/activate
@@ -356,7 +358,7 @@ This can result in significant speedup in encoder performance. Here are the inst
   python convert-whisper-to-openvino.py --model base.en
   ```
-  This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that
   is the default location that the OpenVINO extension will search at runtime.
 - Build `whisper.cpp` with OpenVINO support:
@@ -366,24 +368,28 @@ This can result in significant speedup in encoder performance. Here are the inst
   After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
   Linux:
   ```bash
   source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
   ```
   Windows (cmd):
-  ```
   C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
   ```
   And then build the project using cmake:
   ```bash
   cmake -B build -DWHISPER_OPENVINO=1
   cmake --build build -j --config Release
   ```
 - Run the examples as usual. For example:
-  ```bash
-  ./main -m models/ggml-base.en.bin -f samples/jfk.wav
   ...
@@ -434,7 +440,6 @@ cmake -B build -DWHISPER_CLBLAST=ON
 cmake --build build -j --config Release
 ```
 Run all the examples as usual.
 ## BLAS CPU support via OpenBLAS
@@ -452,10 +457,12 @@ WHISPER_OPENBLAS=1 make -j
 ## Docker
 ### Prerequisites
-* Docker must be installed and running on your system.
-* Create a folder to store big models & intermediate files (ex. /whisper/models)
 ### Images
 We have two Docker images available for this project:
 1. `ghcr.io/ggerganov/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
@@ -491,7 +498,7 @@ in about half a minute on a MacBook M1 Pro, using `medium.en` model:
 <details>
   <summary>Expand to see the result</summary>
-```java
 $ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
 whisper_init_from_file: loading model from 'models/ggml-medium.en.bin'
@@ -563,6 +570,7 @@ whisper_print_timings:   encode time = 18665.10 ms /     9 runs ( 2073.90 ms per
 whisper_print_timings:   decode time = 13090.93 ms /   549 runs (   23.85 ms per run)
 whisper_print_timings:    total time = 32733.52 ms
 ```
 </details>
 ## Real-time audio input example
@@ -571,7 +579,7 @@ This is a naive example of performing real-time inference on audio from your mic
 The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
 More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
-```java
 make stream
 ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
 ```
@@ -583,7 +591,7 @@ https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a
 Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy
 to highlight words with high or low confidence:
-```java
 ./main -m models/ggml-base.en.bin -f samples/gb0.wav --print-colors
 ```
@@ -593,8 +601,8 @@ to highlight words with high or low confidence:
 For example, to limit the line length to a maximum of 16 characters, simply add `-ml 16`:
-```java
-./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16
 whisper_model_load: loading model from './models/ggml-base.en.bin'
 ...
@@ -617,8 +625,8 @@ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 pr
 The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
-```java
-./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1
 whisper_model_load: loading model from './models/ggml-base.en.bin'
 ...
@@ -688,7 +696,7 @@ This requires to have `ffmpeg` installed.
 Here are a few *"typical"* examples:
-```java
 ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -owts
 source ./samples/jfk.wav.wts
 ffplay ./samples/jfk.wav.mp4
@@ -698,7 +706,7 @@ https://user-images.githubusercontent.com/1991296/199337465-dbee4b5e-9aeb-48a3-b
 ---
-```java
 ./main -m ./models/ggml-base.en.bin -f ./samples/mm0.wav -owts
 source ./samples/mm0.wav.wts
 ffplay ./samples/mm0.wav.mp4
@@ -708,7 +716,7 @@ https://user-images.githubusercontent.com/1991296/199337504-cc8fd233-0cb7-4920-9
 ---
-```java
 ./main -m ./models/ggml-base.en.bin -f ./samples/gb0.wav -owts
 source ./samples/gb0.wav.wts
 ffplay ./samples/gb0.wav.mp4
@@ -722,7 +730,7 @@ https://user-images.githubusercontent.com/1991296/199337538-b7b0c7a3-2753-4a88-a
 Use the [extra/bench-wts.sh](https://github.com/ggerganov/whisper.cpp/blob/master/extra/bench-wts.sh) script to generate a video in the following format:
-```java
 ./extra/bench-wts.sh samples/jfk.wav
 ffplay ./samples/jfk.wav.all.mp4
 ```
@@ -751,8 +759,7 @@ It is written in python with the intention of being easy to modify and extend fo
 It outputs a csv file with the results of the benchmarking.
-## ggml format
 The original models are converted to a custom binary format. This allows to pack everything needed into a single file:
@@ -767,51 +774,50 @@ or manually from here:
 - https://huggingface.co/ggerganov/whisper.cpp
 - https://ggml.ggerganov.com
-For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or the README
-in [models](models).
 ## [Bindings](https://github.com/ggerganov/whisper.cpp/discussions/categories/bindings)
-- [X] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggerganov/whisper.cpp/discussions/310)
-- [X] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggerganov/whisper.cpp/discussions/309)
   - React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
-- [X] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggerganov/whisper.cpp/discussions/312)
-- [X] Java:
   - [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
-- [X] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggerganov/whisper.cpp/discussions/507)
-- [X] Objective-C / Swift: [ggerganov/whisper.spm](https://github.com/ggerganov/whisper.spm) | [#313](https://github.com/ggerganov/whisper.cpp/discussions/313)
   - [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
-- [X] .NET: | [#422](https://github.com/ggerganov/whisper.cpp/discussions/422)
   - [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
   - [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
-- [X] Python: | [#9](https://github.com/ggerganov/whisper.cpp/issues/9)
   - [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
   - [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
-- [X] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
-- [X] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
 ## Examples
 There are various examples of using the library for different projects in the [examples](examples) folder.
 Some of the examples are even ported to run in the browser using WebAssembly. Check them out!
-| Example | Web | Description |
-| ---     | --- | ---         |
-| [main](examples/main) | [whisper.wasm](examples/whisper.wasm) | Tool for translating and transcribing audio using Whisper |
-| [bench](examples/bench) | [bench.wasm](examples/bench.wasm) | Benchmark the performance of Whisper on your machine |
-| [stream](examples/stream) | [stream.wasm](examples/stream.wasm) | Real-time transcription of raw microphone capture |
-| [command](examples/command) | [command.wasm](examples/command.wasm) | Basic voice assistant example for receiving voice commands from the mic |
-| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
-| [talk](examples/talk) | [talk.wasm](examples/talk.wasm) | Talk with a GPT-2 bot |
-| [talk-llama](examples/talk-llama) | | Talk with a LLaMA bot |
-| [whisper.objc](examples/whisper.objc) | | iOS mobile application using whisper.cpp |
-| [whisper.swiftui](examples/whisper.swiftui) | | SwiftUI iOS / macOS application using whisper.cpp |
-| [whisper.android](examples/whisper.android) | | Android mobile application using whisper.cpp |
-| [whisper.nvim](examples/whisper.nvim) | | Speech-to-text plugin for Neovim |
-| [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
-| [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggerganov/whisper.cpp/issues/185) |
-| [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
-| [server](examples/server) | | HTTP transcription server with OAI-like API |
 ## [Discussions](https://github.com/ggerganov/whisper.cpp/discussions)

 - [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
 The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp).
+The rest of the code is part of the [`ggml`](https://github.com/ggerganov/ggml) machine learning library.
 Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
 As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
 - Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
 - Various other examples are available in the [examples](examples) folder
+The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
 ## Quick start
+First clone the repository:
+```bash
+git clone https://github.com/ggerganov/whisper.cpp.git
+```
+Then, download one of the Whisper [models](models/README.md) converted in [`ggml` format](#ggml-format). For example:
 ```bash
 bash ./models/download-ggml-model.sh base.en
 ```
 Now build the [main](examples/main) example and transcribe an audio file like this:
 ```bash
 For a quick demo, simply run `make base.en`:
+```text
 $ make base.en
 cc  -I.              -O3 -std=c11   -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
 Note that the [main](examples/main) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
 For example, you can use `ffmpeg` like this:
+```bash
 ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
 ```
 ## Memory usage
+| Model  | Disk    | Mem     |
+| ------ | ------- | ------- |
+| tiny   | 75 MiB  | ~273 MB |
 | base   | 142 MiB | ~388 MB |
 | small  | 466 MiB | ~852 MB |
 | medium | 1.5 GiB | ~2.1 GB |
   - To ensure `coremltools` operates correctly, please confirm that [Xcode](https://developer.apple.com/xcode/) is installed and execute `xcode-select --install` to install the command-line tools.
   - Python 3.10 is recommended.
+  - [OPTIONAL] It is recommended to utilize a Python version management system, such as [Miniconda](https://docs.conda.io/en/latest/miniconda.html) for this step:
     - To create an environment, use: `conda create -n py310-whisper python=3.10 -y`
     - To activate the environment, use: `conda activate py310-whisper`
 - Run the examples as usual. For example:
+  ```text
+  $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
   ...
 - First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
   Windows:
+  ```powershell
   cd models
   python -m venv openvino_conv_env
   openvino_conv_env\Scripts\activate
   ```
   Linux and macOS:
+  ```bash
   cd models
   python3 -m venv openvino_conv_env
   source openvino_conv_env/bin/activate
   python convert-whisper-to-openvino.py --model base.en
   ```
+  This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as `ggml` models, as that
   is the default location that the OpenVINO extension will search at runtime.
 - Build `whisper.cpp` with OpenVINO support:
   After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
   Linux:
   ```bash
   source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
   ```
   Windows (cmd):
+  ```powershell
   C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
   ```
   And then build the project using cmake:
   ```bash
   cmake -B build -DWHISPER_OPENVINO=1
   cmake --build build -j --config Release
   ```
 - Run the examples as usual. For example:
+  ```text
+  $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
   ...
 cmake --build build -j --config Release
 ```
 Run all the examples as usual.
 ## BLAS CPU support via OpenBLAS
 ## Docker
 ### Prerequisites
+- Docker must be installed and running on your system.
+- Create a folder to store big models & intermediate files (ex. /whisper/models)
 ### Images
 We have two Docker images available for this project:
 1. `ghcr.io/ggerganov/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
 <details>
   <summary>Expand to see the result</summary>
+```text
 $ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
 whisper_init_from_file: loading model from 'models/ggml-medium.en.bin'
 whisper_print_timings:   decode time = 13090.93 ms /   549 runs (   23.85 ms per run)
 whisper_print_timings:    total time = 32733.52 ms
 ```
 </details>
 ## Real-time audio input example
 The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
 More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
+```bash
 make stream
 ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
 ```
 Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy
 to highlight words with high or low confidence:
+```bash
 ./main -m models/ggml-base.en.bin -f samples/gb0.wav --print-colors
 ```
 For example, to limit the line length to a maximum of 16 characters, simply add `-ml 16`:
+```text
+$ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16
 whisper_model_load: loading model from './models/ggml-base.en.bin'
 ...
 The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
+```text
+$ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1
 whisper_model_load: loading model from './models/ggml-base.en.bin'
 ...
 Here are a few *"typical"* examples:
+```bash
 ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -owts
 source ./samples/jfk.wav.wts
 ffplay ./samples/jfk.wav.mp4
 ---
+```bash
 ./main -m ./models/ggml-base.en.bin -f ./samples/mm0.wav -owts
 source ./samples/mm0.wav.wts
 ffplay ./samples/mm0.wav.mp4
 ---
+```bash
 ./main -m ./models/ggml-base.en.bin -f ./samples/gb0.wav -owts
 source ./samples/gb0.wav.wts
 ffplay ./samples/gb0.wav.mp4
 Use the [extra/bench-wts.sh](https://github.com/ggerganov/whisper.cpp/blob/master/extra/bench-wts.sh) script to generate a video in the following format:
+```bash
 ./extra/bench-wts.sh samples/jfk.wav
 ffplay ./samples/jfk.wav.all.mp4
 ```
 It outputs a csv file with the results of the benchmarking.
+## `ggml` format
 The original models are converted to a custom binary format. This allows to pack everything needed into a single file:
 - https://huggingface.co/ggerganov/whisper.cpp
 - https://ggml.ggerganov.com
+For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or [models/README.md](models/README.md).
 ## [Bindings](https://github.com/ggerganov/whisper.cpp/discussions/categories/bindings)
+- [x] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggerganov/whisper.cpp/discussions/310)
+- [x] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggerganov/whisper.cpp/discussions/309)
   - React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
+- [x] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggerganov/whisper.cpp/discussions/312)
+- [x] Java:
   - [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
+- [x] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggerganov/whisper.cpp/discussions/507)
+- [x] Objective-C / Swift: [ggerganov/whisper.spm](https://github.com/ggerganov/whisper.spm) | [#313](https://github.com/ggerganov/whisper.cpp/discussions/313)
   - [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
+- [x] .NET: | [#422](https://github.com/ggerganov/whisper.cpp/discussions/422)
   - [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
   - [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
+- [x] Python: | [#9](https://github.com/ggerganov/whisper.cpp/issues/9)
   - [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
   - [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
+- [x] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
+- [x] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
 ## Examples
 There are various examples of using the library for different projects in the [examples](examples) folder.
 Some of the examples are even ported to run in the browser using WebAssembly. Check them out!
+| Example                                             | Web                                   | Description                                                                                                                     |
+| --------------------------------------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
+| [main](examples/main)                               | [whisper.wasm](examples/whisper.wasm) | Tool for translating and transcribing audio using Whisper                                                                       |
+| [bench](examples/bench)                             | [bench.wasm](examples/bench.wasm)     | Benchmark the performance of Whisper on your machine                                                                            |
+| [stream](examples/stream)                           | [stream.wasm](examples/stream.wasm)   | Real-time transcription of raw microphone capture                                                                               |
+| [command](examples/command)                         | [command.wasm](examples/command.wasm) | Basic voice assistant example for receiving voice commands from the mic                                                         |
+| [wchess](examples/wchess)                           | [wchess.wasm](examples/wchess)        | Voice-controlled chess                                                                                                          |
+| [talk](examples/talk)                               | [talk.wasm](examples/talk.wasm)       | Talk with a GPT-2 bot                                                                                                           |
+| [talk-llama](examples/talk-llama)                   |                                       | Talk with a LLaMA bot                                                                                                           |
+| [whisper.objc](examples/whisper.objc)               |                                       | iOS mobile application using whisper.cpp                                                                                        |
+| [whisper.swiftui](examples/whisper.swiftui)         |                                       | SwiftUI iOS / macOS application using whisper.cpp                                                                               |
+| [whisper.android](examples/whisper.android)         |                                       | Android mobile application using whisper.cpp                                                                                    |
+| [whisper.nvim](examples/whisper.nvim)               |                                       | Speech-to-text plugin for Neovim                                                                                                |
+| [generate-karaoke.sh](examples/generate-karaoke.sh) |                                       | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture                           |
+| [livestream.sh](examples/livestream.sh)             |                                       | [Livestream audio transcription](https://github.com/ggerganov/whisper.cpp/issues/185)                                           |
+| [yt-wsp.sh](examples/yt-wsp.sh)                     |                                       | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
+| [server](examples/server)                           |                                       | HTTP transcription server with OAI-like API                                                                                     |
 ## [Discussions](https://github.com/ggerganov/whisper.cpp/discussions)

bindings/javascript/README.md CHANGED Viewed

@@ -41,7 +41,7 @@ make publish-npm
 ## Sample run
-```java
 $ node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js
 whisper_model_load: loading model from 'whisper.bin'
@@ -63,7 +63,7 @@ whisper_model_load: ggml ctx size =  140.60 MB
 whisper_model_load: memory size   =   22.83 MB
 whisper_model_load: model size    =  140.54 MB
-system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | NEON = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 1 | BLAS = 0 |
 operator(): processing 176000 samples, 11.0 sec, 8 threads, 1 processors, lang = en, task = transcribe ...

 ## Sample run
+```text
 $ node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js
 whisper_model_load: loading model from 'whisper.bin'
 whisper_model_load: memory size   =   22.83 MB
 whisper_model_load: model size    =  140.54 MB
+system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | NEON = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 1 | BLAS = 0 |
 operator(): processing 176000 samples, 11.0 sec, 8 threads, 1 processors, lang = en, task = transcribe ...

examples/stream/README.md CHANGED Viewed

@@ -4,7 +4,7 @@ This is a naive example of performing real-time inference on audio from your mic
 The `stream` tool samples the audio every half a second and runs the transcription continously.
 More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
-```java
 ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
 ```
@@ -14,7 +14,7 @@ https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a
 Setting the `--step` argument to `0` enables the sliding window mode:
-```java
  ./stream -m ./models/ggml-small.en.bin -t 6 --step 0 --length 30000 -vth 0.6
 ```
@@ -39,8 +39,8 @@ brew install sdl2
 make stream
 ```
-Ensure you are at the root of the repo when running `make stream`.  Not within the `examples/stream` dir
-as the libraries needed like `common-sdl.h` are located within `examples`.  Attempting to compile within
 `examples/steam` means your compiler cannot find them and it gives an error it cannot find the file.
 ```bash

 The `stream` tool samples the audio every half a second and runs the transcription continously.
 More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/issues/10).
+```bash
 ./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
 ```
 Setting the `--step` argument to `0` enables the sliding window mode:
+```bash
  ./stream -m ./models/ggml-small.en.bin -t 6 --step 0 --length 30000 -vth 0.6
 ```
 make stream
 ```
+Ensure you are at the root of the repo when running `make stream`. Not within the `examples/stream` dir
+as the libraries needed like `common-sdl.h` are located within `examples`. Attempting to compile within
 `examples/steam` means your compiler cannot find them and it gives an error it cannot find the file.
 ```bash

examples/whisper.objc/README.md CHANGED Viewed

@@ -11,11 +11,11 @@ https://user-images.githubusercontent.com/1991296/204126266-ce4177c6-6eca-4bd9-b
 ## Usage
-```java
 git clone https://github.com/ggerganov/whisper.cpp
 open whisper.cpp/examples/whisper.objc/whisper.objc.xcodeproj/
-// If you don't want to convert a Core ML model, you can skip this step by create dummy model
 mkdir models/ggml-base.en-encoder.mlmodelc
 ```

 ## Usage
+```bash
 git clone https://github.com/ggerganov/whisper.cpp
 open whisper.cpp/examples/whisper.objc/whisper.objc.xcodeproj/
+# if you don't want to convert a Core ML model, you can skip this step by create dummy model
 mkdir models/ggml-base.en-encoder.mlmodelc
 ```

models/README.md CHANGED Viewed

@@ -1,19 +1,16 @@
-## Whisper model files in custom ggml format
-The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L27)
 are converted to custom `ggml` format in order to be able to load them in C/C++.
 Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
-You can either obtain the original models and generate the `ggml` files yourself using the conversion script,
-or you can use the [download-ggml-model.sh](download-ggml-model.sh) script to download the already converted models.
-Currently, they are hosted on the following locations:
-- https://huggingface.co/ggerganov/whisper.cpp
-- https://ggml.ggerganov.com
-Sample download:
-```java
 $ ./download-ggml-model.sh base.en
 Downloading ggml model base.en ...
 models/ggml-base.en.bin          100%[=============================================>] 141.11M  5.41MB/s    in 22s
@@ -23,35 +20,46 @@ You can now use it like this:
   $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
 ```
-To convert the files yourself, use the convert-pt-to-ggml.py script. Here is an example usage.
-The original PyTorch files are assumed to have been downloaded into ~/.cache/whisper
-Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
-```
 mkdir models/whisper-medium
 python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
 mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
 rmdir models/whisper-medium
 ```
-A third option to obtain the model files is to download them from Hugging Face:
-https://huggingface.co/ggerganov/whisper.cpp/tree/main
 ## Available models
-| Model     | Disk    | SHA                                        |
-| ---       | ---     | ---                                        |
-| tiny      |  75 MiB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
-| tiny.en   |  75 MiB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
-| base      | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
-| base.en   | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` |
-| small     | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
-| small.en  | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
-| medium    | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
-| medium.en | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
-| large-v1  | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
-| large-v2  | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
-| large-v3  | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |
 ## Model files for testing purposes

+## Whisper model files in custom `ggml` format
+The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L30)
 are converted to custom `ggml` format in order to be able to load them in C/C++.
 Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
+There are three ways to obtain `ggml` models:
+### 1. Use [download-ggml-model.sh](download-ggml-model.sh) to download pre-converted models
+Example download:
+```text
 $ ./download-ggml-model.sh base.en
 Downloading ggml model base.en ...
 models/ggml-base.en.bin          100%[=============================================>] 141.11M  5.41MB/s    in 22s
   $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
 ```
+### 2. Manually download pre-converted models
+`ggml` models are available from the following locations:
+- https://huggingface.co/ggerganov/whisper.cpp/tree/main
+- https://ggml.ggerganov.com
+### 3. Convert with [convert-pt-to-ggml.py](convert-pt-to-ggml.py)
+Download one of the [models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L30) and generate the `ggml` files using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
+Example conversion, assuming the original PyTorch files have been downloaded into `~/.cache/whisper`. Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
+```bash
 mkdir models/whisper-medium
 python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
 mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
 rmdir models/whisper-medium
 ```
 ## Available models
+| Model         | Disk    | SHA                                        |
+| ------------- | ------- | ------------------------------------------ |
+| tiny          | 75 MiB  | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
+| tiny.en       | 75 MiB  | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
+| base          | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
+| base.en       | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` |
+| small         | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
+| small.en      | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
+| small.en-tdrz | 465 MiB | `b6c6e7e89af1a35c08e6de56b66ca6a02a2fdfa1` |
+| medium        | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
+| medium.en     | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
+| large-v1      | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
+| large-v2      | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
+| large-v2-q5_0 | 1.1 GiB | `00e39f2196344e901b3a2bd5814807a769bd1630` |
+| large-v3      | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |
+| large-v3-q5_0 | 1.1 GiB | `e6e2ed78495d403bef4b7cff42ef4aaadcfea8de` |
+Models are multilingual unless the model name includes `.en`. Models ending in `-q5_0` are [quantized](../README.md#quantization). Models ending in `-tdrz` support local diarization (marking of speaker turns) using [tinydiarize](https://github.com/akashmjn/tinydiarize). More information about models is available [upstream (openai/whisper)](https://github.com/openai/whisper#available-models-and-languages). The list above is a subset of the models supported by the [download-ggml-model.sh](download-ggml-model.sh) script, but many more are available at https://huggingface.co/ggerganov/whisper.cpp/tree/main and elsewhere.
 ## Model files for testing purposes

models/download-ggml-model.sh CHANGED Viewed

@@ -9,6 +9,9 @@
 src="https://huggingface.co/ggerganov/whisper.cpp"
 pfx="resolve/main/ggml"
 # get the path of this script
 get_script_path() {
     if [ -x "$(command -v realpath)" ]; then
@@ -22,17 +25,17 @@ get_script_path() {
 models_path="${2:-$(get_script_path)}"
 # Whisper models
-models="tiny.en
-tiny
 tiny-q5_1
 tiny.en-q5_1
-base.en
 base
 base-q5_1
 base.en-q5_1
 small.en
 small.en-tdrz
-small
 small-q5_1
 small.en-q5_1
 medium
@@ -41,14 +44,21 @@ medium-q5_0
 medium.en-q5_0
 large-v1
 large-v2
 large-v3
 large-v3-q5_0"
 # list available models
 list_models() {
     printf "\n"
-    printf "  Available models:"
     for model in $models; do
         printf " %s" "$model"
     done
     printf "\n\n"
@@ -57,6 +67,8 @@ list_models() {
 if [ "$#" -lt 1 ] || [ "$#" -gt 2 ]; then
     printf "Usage: %s <model> [models_path]\n" "$0"
     list_models
     exit 1
 fi
@@ -98,14 +110,12 @@ else
     exit 1
 fi
 if [ $? -ne 0 ]; then
     printf "Failed to download ggml model %s \n" "$model"
     printf "Please try again later or download the original Whisper model files and convert them yourself.\n"
     exit 1
 fi
 printf "Done! Model '%s' saved in '%s/ggml-%s.bin'\n" "$model" "$models_path" "$model"
 printf "You can now use it like this:\n\n"
 printf "  $ ./main -m %s/ggml-%s.bin -f samples/jfk.wav\n" "$models_path" "$model"

 src="https://huggingface.co/ggerganov/whisper.cpp"
 pfx="resolve/main/ggml"
+BOLD="\033[1m"
+RESET='\033[0m'
 # get the path of this script
 get_script_path() {
     if [ -x "$(command -v realpath)" ]; then
 models_path="${2:-$(get_script_path)}"
 # Whisper models
+models="tiny
+tiny.en
 tiny-q5_1
 tiny.en-q5_1
 base
+base.en
 base-q5_1
 base.en-q5_1
+small
 small.en
 small.en-tdrz
 small-q5_1
 small.en-q5_1
 medium
 medium.en-q5_0
 large-v1
 large-v2
+large-v2-q5_0
 large-v3
 large-v3-q5_0"
 # list available models
 list_models() {
     printf "\n"
+    printf "Available models:"
+    model_class=""
     for model in $models; do
+        this_model_class="${model%%[.-]*}"
+        if [ "$this_model_class" != "$model_class" ]; then
+            printf "\n "
+            model_class=$this_model_class
+        fi
         printf " %s" "$model"
     done
     printf "\n\n"
 if [ "$#" -lt 1 ] || [ "$#" -gt 2 ]; then
     printf "Usage: %s <model> [models_path]\n" "$0"
     list_models
+    printf "___________________________________________________________\n"
+    printf "${BOLD}.en${RESET} = english-only ${BOLD}-q5_[01]${RESET} = quantized ${BOLD}-tdrz${RESET} = tinydiarize\n"
     exit 1
 fi
     exit 1
 fi
 if [ $? -ne 0 ]; then
     printf "Failed to download ggml model %s \n" "$model"
     printf "Please try again later or download the original Whisper model files and convert them yourself.\n"
     exit 1
 fi
 printf "Done! Model '%s' saved in '%s/ggml-%s.bin'\n" "$model" "$models_path" "$model"
 printf "You can now use it like this:\n\n"
 printf "  $ ./main -m %s/ggml-%s.bin -f samples/jfk.wav\n" "$models_path" "$model"