ggerganov commited on
Commit
105ffef
·
unverified ·
1 Parent(s): b69a5a5

Minor updates

Browse files
Files changed (2) hide show
  1. README.md +10 -5
  2. stream.cpp +5 -0
README.md CHANGED
@@ -7,13 +7,12 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
7
  - Mixed F16 / F32 precision
8
  - Low memory usage (Flash Attention + Flash Forward)
9
  - Zero memory allocations at runtime
10
- - Runs on the CPU (Mac and Linux)
11
  - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
 
12
 
13
  Incoming features:
14
  - [Realtime audio input transcription](https://github.com/ggerganov/whisper.cpp/issues/10#issuecomment-1264665959)
15
- - [Raspberry Pi support](https://github.com/ggerganov/whisper.cpp/issues/7)
16
- - [Android support](https://github.com/ggerganov/whisper.cpp/issues/8)
17
 
18
  ## Usage
19
 
@@ -220,10 +219,16 @@ $ ./stream -m models/ggml-small.en.bin -t 8
220
 
221
  https://user-images.githubusercontent.com/1991296/193465125-c163d304-64f6-4f5d-83e5-72239c9a203e.mp4
222
 
 
 
 
 
 
 
 
223
  ## Limitations
224
 
225
- - Very basic greedy sampling scheme - always pick up the top token
226
- - Only 16-bit WAV at 16 kHz is supported
227
  - Inference only
228
  - No GPU support
229
 
 
7
  - Mixed F16 / F32 precision
8
  - Low memory usage (Flash Attention + Flash Forward)
9
  - Zero memory allocations at runtime
10
+ - Runs on the CPU
11
  - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
12
+ - Supported platforms: Linux, Mac OS (Intel and Arm), Raspberry Pi, Android
13
 
14
  Incoming features:
15
  - [Realtime audio input transcription](https://github.com/ggerganov/whisper.cpp/issues/10#issuecomment-1264665959)
 
 
16
 
17
  ## Usage
18
 
 
219
 
220
  https://user-images.githubusercontent.com/1991296/193465125-c163d304-64f6-4f5d-83e5-72239c9a203e.mp4
221
 
222
+ ## Implementation details
223
+
224
+ - The core tensor operations are implemented in C (`ggml.h` / `ggml.c`)
225
+ - The high-level C-style API is implemented in C++ (`whisper.h` / `whisper.cpp`)
226
+ - Simple usage is demonstrated in `main.cpp`
227
+ - Sample real-time audio transcription from the microphone is demonstrated in `stream.cpp`
228
+
229
  ## Limitations
230
 
231
+ - Very basic greedy sampling scheme - always pick up the top token. You can implement your own strategy
 
232
  - Inference only
233
  - No GPU support
234
 
stream.cpp CHANGED
@@ -265,6 +265,11 @@ int main(int argc, char ** argv) {
265
 
266
  wparams.print_progress = false;
267
  wparams.print_special_tokens = params.print_special_tokens;
 
 
 
 
 
268
 
269
  if (whisper_full(ctx, wparams, pcmf32.data(), pcmf32.size()) != 0) {
270
  fprintf(stderr, "%s: failed to process audio\n", argv[0]);
 
265
 
266
  wparams.print_progress = false;
267
  wparams.print_special_tokens = params.print_special_tokens;
268
+ wparams.print_realtime = false;
269
+ wparams.print_timestamps = !params.no_timestamps;
270
+ wparams.translate = params.translate;
271
+ wparams.language = params.language.c_str();
272
+ wparams.n_threads = params.n_threads;
273
 
274
  if (whisper_full(ctx, wparams, pcmf32.data(), pcmf32.size()) != 0) {
275
  fprintf(stderr, "%s: failed to process audio\n", argv[0]);