Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -135,7 +135,7 @@ The command downloads the `base.en` model converted to custom `ggml` format and
|
|
| 135 |
|
| 136 |
For detailed usage instructions, run: `./main -h`
|
| 137 |
|
| 138 |
-
Note that
|
| 139 |
For example, you can use `ffmpeg` like this:
|
| 140 |
|
| 141 |
```java
|
|
@@ -171,6 +171,9 @@ make large
|
|
| 171 |
Here is another example of transcribing a [3:24 min speech](https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg)
|
| 172 |
in about half a minute on a MacBook M1 Pro, using `medium.en` model:
|
| 173 |
|
|
|
|
|
|
|
|
|
|
| 174 |
```java
|
| 175 |
$ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
|
| 176 |
|
|
@@ -237,6 +240,7 @@ whisper_print_timings: encode time = 19552.61 ms / 814.69 ms per layer
|
|
| 237 |
whisper_print_timings: decode time = 13249.96 ms / 552.08 ms per layer
|
| 238 |
whisper_print_timings: total time = 33686.27 ms
|
| 239 |
```
|
|
|
|
| 240 |
|
| 241 |
## Real-time audio input example
|
| 242 |
|
|
@@ -250,18 +254,6 @@ More info is available in [issue #10](https://github.com/ggerganov/whisper.cpp/i
|
|
| 250 |
|
| 251 |
https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4
|
| 252 |
|
| 253 |
-
The [stream](examples/stream) tool depends on SDL2 library to capture audio from the microphone. You can build it like this:
|
| 254 |
-
|
| 255 |
-
```bash
|
| 256 |
-
# Install SDL2 on Linux
|
| 257 |
-
sudo apt-get install libsdl2-dev
|
| 258 |
-
|
| 259 |
-
# Install SDL2 on Mac OS
|
| 260 |
-
brew install sdl2
|
| 261 |
-
|
| 262 |
-
make stream
|
| 263 |
-
```
|
| 264 |
-
|
| 265 |
## Confidence color-coding
|
| 266 |
|
| 267 |
Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy
|
|
@@ -306,6 +298,13 @@ the Accelerate framework utilizes the special-purpose AMX coprocessor available
|
|
| 306 |
| medium | 1.5 GB | ~2.6 GB |
|
| 307 |
| large | 2.9 GB | ~4.7 GB |
|
| 308 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 309 |
|
| 310 |
## ggml format
|
| 311 |
|
|
|
|
| 135 |
|
| 136 |
For detailed usage instructions, run: `./main -h`
|
| 137 |
|
| 138 |
+
Note that the [main](examples/main) example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool.
|
| 139 |
For example, you can use `ffmpeg` like this:
|
| 140 |
|
| 141 |
```java
|
|
|
|
| 171 |
Here is another example of transcribing a [3:24 min speech](https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg)
|
| 172 |
in about half a minute on a MacBook M1 Pro, using `medium.en` model:
|
| 173 |
|
| 174 |
+
<details>
|
| 175 |
+
<summary>Expand to see the result</summary>
|
| 176 |
+
|
| 177 |
```java
|
| 178 |
$ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
|
| 179 |
|
|
|
|
| 240 |
whisper_print_timings: decode time = 13249.96 ms / 552.08 ms per layer
|
| 241 |
whisper_print_timings: total time = 33686.27 ms
|
| 242 |
```
|
| 243 |
+
</details>
|
| 244 |
|
| 245 |
## Real-time audio input example
|
| 246 |
|
|
|
|
| 254 |
|
| 255 |
https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4
|
| 256 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 257 |
## Confidence color-coding
|
| 258 |
|
| 259 |
Adding the `--print-colors` argument will print the transcribed text using an experimental color coding strategy
|
|
|
|
| 298 |
| medium | 1.5 GB | ~2.6 GB |
|
| 299 |
| large | 2.9 GB | ~4.7 GB |
|
| 300 |
|
| 301 |
+
## Benchmarks
|
| 302 |
+
|
| 303 |
+
In order to have an objective comparison of the performance of the inference across different system configurations,
|
| 304 |
+
use the [bench](examples/bench) tool. The tool simply runs the Encoder part of the model and prints how much time it
|
| 305 |
+
took to execute it. The results are summarized in the following Github issue:
|
| 306 |
+
|
| 307 |
+
[Benchmark results](https://github.com/ggerganov/whisper.cpp/issues/89)
|
| 308 |
|
| 309 |
## ggml format
|
| 310 |
|