ggerganov commited on
Commit
0c70188
·
1 Parent(s): 43e9712

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -13
README.md CHANGED
@@ -135,7 +135,7 @@ The command downloads the `base.en` model converted to custom `ggml` format and
135
 
136
  For detailed usage instructions, run: `./main -h`
137
 
138
- Note that `whisper.cpp` currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
139
  For example, you can use `ffmpeg` like this:
140
 
141
  ```java
@@ -171,6 +171,9 @@ make large
171
  Here is another example of transcribing a [3:24 min speech](https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg)
172
  in about half a minute on a MacBook M1 Pro, using `medium.en` model:
173
 
 
 
 
174
  ```java
175
  $ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
176
 
@@ -237,6 +240,7 @@ whisper_print_timings: encode time = 19552.61 ms / 814.69 ms per layer
237
  whisper_print_timings: decode time = 13249.96 ms / 552.08 ms per layer
238
  whisper_print_timings: total time = 33686.27 ms
239
  ```
 
240
 
241
  ## Real-time audio input example
242
 
@@ -250,18 +254,6 @@ More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/i
250
 
251
  https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4
252
 
253
- The [stream](examples/stream) tool depends on SDL2 library to capture audio from the microphone. You can build it like this:
254
-
255
- ```bash
256
- # Install SDL2 on Linux
257
- sudo apt-get install libsdl2-dev
258
-
259
- # Install SDL2 on Mac OS
260
- brew install sdl2
261
-
262
- make stream
263
- ```
264
-
265
  ## Confidence color-coding
266
 
267
  Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy
@@ -306,6 +298,13 @@ the Accelerate framework utilizes the special-purpose AMX coprocessor available
306
  | medium | 1.5 GB | ~2.6 GB |
307
  | large | 2.9 GB | ~4.7 GB |
308
 
 
 
 
 
 
 
 
309
 
310
  ## ggml format
311
 
 
135
 
136
  For detailed usage instructions, run: `./main -h`
137
 
138
+ Note that the [main](examples/main) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
139
  For example, you can use `ffmpeg` like this:
140
 
141
  ```java
 
171
  Here is another example of transcribing a [3:24 min speech](https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg)
172
  in about half a minute on a MacBook M1 Pro, using `medium.en` model:
173
 
174
+ <details>
175
+ <summary>Expand to see the result</summary>
176
+
177
  ```java
178
  $ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
179
 
 
240
  whisper_print_timings: decode time = 13249.96 ms / 552.08 ms per layer
241
  whisper_print_timings: total time = 33686.27 ms
242
  ```
243
+ </details>
244
 
245
  ## Real-time audio input example
246
 
 
254
 
255
  https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4
256
 
 
 
 
 
 
 
 
 
 
 
 
 
257
  ## Confidence color-coding
258
 
259
  Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy
 
298
  | medium | 1.5 GB | ~2.6 GB |
299
  | large | 2.9 GB | ~4.7 GB |
300
 
301
+ ## Benchmarks
302
+
303
+ In order to have an objective comparison of the performance of the inference across different system configurations,
304
+ use the [bench](examples/bench) tool. The tool simply runs the Encoder part of the model and prints how much time it
305
+ took to execute it. The results are summarized in the following Github issue:
306
+
307
+ [Benchmark results](https://github.com/ggerganov/whisper.cpp/issues/89)
308
 
309
  ## ggml format
310