Spaces:

natasa365
/

whisper.cpp

Running

App Files Files Community

ggerganov commited on Nov 2, 2022

Commit

05261df

unverified ·

1 Parent(s): c71bca8

extra : compute SHA of all models files

Browse files

Files changed (3) hide show

README.md +24 -17
extra/sha-all.sh +7 -0
models/README.md +14 -0

README.md CHANGED Viewed

@@ -59,8 +59,8 @@ For a quick demo, simply run `make base.en`:
 ```java
 $ make base.en
-cc  -I.              -O3 -std=c11   -pthread -DGGML_USE_ACCELERATE   -c ggml.c
-c++ -I. -I./examples -O3 -std=c++11 -pthread -c whisper.cpp
 c++ -I. -I./examples -O3 -std=c++11 -pthread examples/main/main.cpp whisper.o ggml.o -o main  -framework Accelerate
 ./main -h
@@ -70,13 +70,17 @@ options:
   -h,       --help           show this help message and exit
   -s SEED,  --seed SEED      RNG seed (default: -1)
   -t N,     --threads N      number of threads to use during computation (default: 4)
   -ot N,    --offset-t N     time offset in milliseconds (default: 0)
   -on N,    --offset-n N     segment index offset (default: 0)
   -v,       --verbose        verbose output
             --translate      translate from source language to english
   -otxt,    --output-txt     output result in a text file
   -ovtt,    --output-vtt     output result in a vtt file
   -osrt,    --output-srt     output result in a srt file
   -ps,      --print_special  print special tokens
   -pc,      --print_colors   print colors
   -nt,      --no_timestamps  do not print timestamps
@@ -86,7 +90,7 @@ options:
 bash ./models/download-ggml-model.sh base.en
 Downloading ggml model base.en ...
-ggml-base.en.bin               100%[========================>] 141.11M  6.34MB/s    in 24s
 Done! Model 'base.en' saved in 'models/ggml-base.en.bin'
 You can now use it like this:
@@ -114,23 +118,26 @@ whisper_model_load: n_text_layer  = 6
 whisper_model_load: n_mels        = 80
 whisper_model_load: f16           = 1
 whisper_model_load: type          = 2
-whisper_model_load: mem_required  = 505.00 MB
 whisper_model_load: adding 1607 extra tokens
-whisper_model_load: ggml ctx size = 163.43 MB
 whisper_model_load: memory size =    22.83 MB
 whisper_model_load: model size  =   140.54 MB
-main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, lang = en, task = transcribe, timestamps = 1 ...
-[00:00.000 --> 00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
-whisper_print_timings:     load time =    87.21 ms
-whisper_print_timings:      mel time =    24.26 ms
-whisper_print_timings:   sample time =     3.87 ms
-whisper_print_timings:   encode time =   323.67 ms / 53.94 ms per layer
-whisper_print_timings:   decode time =    83.25 ms / 13.87 ms per layer
-whisper_print_timings:    total time =   522.66 ms
 ```
 The command downloads the `base.en` model converted to custom `ggml` format and runs the inference on all `.wav` samples in the folder `samples`.
@@ -172,8 +179,8 @@ make large
 | Model  | Disk   | Mem     | SHA                                        |
 | ---    | ---    | ---     | ---                                        |
-| tiny   |  75 MB | ~280 MB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
-| base   | 142 MB | ~430 MB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
 | small  | 466 MB | ~1.0 GB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
 | medium | 1.5 GB | ~2.6 GB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
 | large  | 2.9 GB | ~4.7 GB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
@@ -185,7 +192,7 @@ in about half a minute on a MacBook M1 Pro, using `medium.en` model:
 <details>
   <summary>Expand to see the result</summary>
 ```java
 $ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
@@ -315,7 +322,7 @@ https://user-images.githubusercontent.com/1991296/199337538-b7b0c7a3-2753-4a88-a
 ## Implementation details
 - The core tensor operations are implemented in C ([ggml.h](ggml.h) / [ggml.c](ggml.c))
-- The high-level C-style API is implemented in C++ ([whisper.h](whisper.h) / [whisper.cpp](whisper.cpp))
 - Sample usage is demonstrated in [main.cpp](examples/main)
 - Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
 - Various other examples are available in the [examples](examples) folder

 ```java
 $ make base.en
+cc  -I.              -O3 -std=c11   -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
+c++ -I. -I./examples -O3 -std=c++11 -pthread -c whisper.cpp -o whisper.o
 c++ -I. -I./examples -O3 -std=c++11 -pthread examples/main/main.cpp whisper.o ggml.o -o main  -framework Accelerate
 ./main -h
   -h,       --help           show this help message and exit
   -s SEED,  --seed SEED      RNG seed (default: -1)
   -t N,     --threads N      number of threads to use during computation (default: 4)
+  -p N,     --processors N   number of processors to use during computation (default: 1)
   -ot N,    --offset-t N     time offset in milliseconds (default: 0)
   -on N,    --offset-n N     segment index offset (default: 0)
+  -mc N,    --max-context N  maximum number of text context tokens to store (default: max)
+  -wt N,    --word-thold N   word timestamp probability threshold (default: 0.010000)
   -v,       --verbose        verbose output
             --translate      translate from source language to english
   -otxt,    --output-txt     output result in a text file
   -ovtt,    --output-vtt     output result in a vtt file
   -osrt,    --output-srt     output result in a srt file
+  -owts,    --output-words   output word-level timestamps to a text file
   -ps,      --print_special  print special tokens
   -pc,      --print_colors   print colors
   -nt,      --no_timestamps  do not print timestamps
 bash ./models/download-ggml-model.sh base.en
 Downloading ggml model base.en ...
+ggml-base.en.bin               100%[========================>] 141.11M  6.34MB/s    in 24s
 Done! Model 'base.en' saved in 'models/ggml-base.en.bin'
 You can now use it like this:
 whisper_model_load: n_mels        = 80
 whisper_model_load: f16           = 1
 whisper_model_load: type          = 2
+whisper_model_load: mem_required  = 670.00 MB
 whisper_model_load: adding 1607 extra tokens
+whisper_model_load: ggml ctx size = 140.60 MB
 whisper_model_load: memory size =    22.83 MB
 whisper_model_load: model size  =   140.54 MB
+system_info: n_threads = 4 / 10 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 |
+main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
+[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
+whisper_print_timings:     load time =   105.91 ms
+whisper_print_timings:      mel time =    24.62 ms
+whisper_print_timings:   sample time =     3.63 ms
+whisper_print_timings:   encode time =   324.71 ms / 54.12 ms per layer
+whisper_print_timings:   decode time =    83.58 ms / 13.93 ms per layer
+whisper_print_timings:    total time =   542.81 ms
 ```
 The command downloads the `base.en` model converted to custom `ggml` format and runs the inference on all `.wav` samples in the folder `samples`.
 | Model  | Disk   | Mem     | SHA                                        |
 | ---    | ---    | ---     | ---                                        |
+| tiny   |  75 MB | ~390 MB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
+| base   | 142 MB | ~500 MB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
 | small  | 466 MB | ~1.0 GB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
 | medium | 1.5 GB | ~2.6 GB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
 | large  | 2.9 GB | ~4.7 GB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
 <details>
   <summary>Expand to see the result</summary>
 ```java
 $ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
 ## Implementation details
 - The core tensor operations are implemented in C ([ggml.h](ggml.h) / [ggml.c](ggml.c))
+- The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](whisper.h) / [whisper.cpp](whisper.cpp))
 - Sample usage is demonstrated in [main.cpp](examples/main)
 - Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
 - Various other examples are available in the [examples](examples) folder

extra/sha-all.sh ADDED Viewed

	@@ -0,0 +1,7 @@

+#!/bin/bash
+# Compute the SHA1 of all model files in ./models/ggml-*.bin
+for f in ./models/ggml-*.bin; do
+    shasum "$f" -a 1
+done

models/README.md CHANGED Viewed

@@ -22,6 +22,20 @@ A third option to obtain the model files is to download them from Hugging Face:
 https://huggingface.co/datasets/ggerganov/whisper.cpp/tree/main
 ## Model files for testing purposes
 The model files pefixed with `for-tests-` are empty (i.e. do not contain any weights) and are used by the CI for testing purposes.

 https://huggingface.co/datasets/ggerganov/whisper.cpp/tree/main
+## Available models
+| Model     | Disk   | Mem     | SHA                                        |
+| ---       | ---    | ---     | ---                                        |
+| tiny      |  75 MB | ~390 MB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
+| tiny.en   |  75 MB | ~390 MB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
+| base      | 142 MB | ~500 MB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
+| base.en   | 142 MB | ~500 MB | `137c40403d78fd54d454da0f9bd998f78703390c` |
+| small     | 466 MB | ~1.0 GB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
+| small.en  | 466 MB | ~1.0 GB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
+| medium    | 1.5 GB | ~2.6 GB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
+| medium.en | 1.5 GB | ~2.6 GB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
+| large     | 2.9 GB | ~4.7 GB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
 ## Model files for testing purposes
 The model files pefixed with `for-tests-` are empty (i.e. do not contain any weights) and are used by the CI for testing purposes.