Spaces:
Running
Running
Minor updates
Browse files- README.md +10 -5
- stream.cpp +5 -0
README.md
CHANGED
|
@@ -7,13 +7,12 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
|
|
| 7 |
- Mixed F16 / F32 precision
|
| 8 |
- Low memory usage (Flash Attention + Flash Forward)
|
| 9 |
- Zero memory allocations at runtime
|
| 10 |
-
- Runs on the CPU
|
| 11 |
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
|
|
|
|
| 12 |
|
| 13 |
Incoming features:
|
| 14 |
- [Realtime audio input transcription](https://github.com/ggerganov/whisper.cpp/issues/10#issuecomment-1264665959)
|
| 15 |
-
- [Raspberry Pi support](https://github.com/ggerganov/whisper.cpp/issues/7)
|
| 16 |
-
- [Android support](https://github.com/ggerganov/whisper.cpp/issues/8)
|
| 17 |
|
| 18 |
## Usage
|
| 19 |
|
|
@@ -220,10 +219,16 @@ $ ./stream -m models/ggml-small.en.bin -t 8
|
|
| 220 |
|
| 221 |
https://user-images.githubusercontent.com/1991296/193465125-c163d304-64f6-4f5d-83e5-72239c9a203e.mp4
|
| 222 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 223 |
## Limitations
|
| 224 |
|
| 225 |
-
- Very basic greedy sampling scheme - always pick up the top token
|
| 226 |
-
- Only 16-bit WAV at 16 kHz is supported
|
| 227 |
- Inference only
|
| 228 |
- No GPU support
|
| 229 |
|
|
|
|
| 7 |
- Mixed F16 / F32 precision
|
| 8 |
- Low memory usage (Flash Attention + Flash Forward)
|
| 9 |
- Zero memory allocations at runtime
|
| 10 |
+
- Runs on the CPU
|
| 11 |
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
|
| 12 |
+
- Supported platforms: Linux, Mac OS (Intel and Arm), Raspberry Pi, Android
|
| 13 |
|
| 14 |
Incoming features:
|
| 15 |
- [Realtime audio input transcription](https://github.com/ggerganov/whisper.cpp/issues/10#issuecomment-1264665959)
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Usage
|
| 18 |
|
|
|
|
| 219 |
|
| 220 |
https://user-images.githubusercontent.com/1991296/193465125-c163d304-64f6-4f5d-83e5-72239c9a203e.mp4
|
| 221 |
|
| 222 |
+
## Implementation details
|
| 223 |
+
|
| 224 |
+
- The core tensor operations are implemented in C (`ggml.h` / `ggml.c`)
|
| 225 |
+
- The high-level C-style API is implemented in C++ (`whisper.h` / `whisper.cpp`)
|
| 226 |
+
- Simple usage is demonstrated in `main.cpp`
|
| 227 |
+
- Sample real-time audio transcription from the microphone is demonstrated in `stream.cpp`
|
| 228 |
+
|
| 229 |
## Limitations
|
| 230 |
|
| 231 |
+
- Very basic greedy sampling scheme - always pick up the top token. You can implement your own strategy
|
|
|
|
| 232 |
- Inference only
|
| 233 |
- No GPU support
|
| 234 |
|
stream.cpp
CHANGED
|
@@ -265,6 +265,11 @@ int main(int argc, char ** argv) {
|
|
| 265 |
|
| 266 |
wparams.print_progress = false;
|
| 267 |
wparams.print_special_tokens = params.print_special_tokens;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 268 |
|
| 269 |
if (whisper_full(ctx, wparams, pcmf32.data(), pcmf32.size()) != 0) {
|
| 270 |
fprintf(stderr, "%s: failed to process audio\n", argv[0]);
|
|
|
|
| 265 |
|
| 266 |
wparams.print_progress = false;
|
| 267 |
wparams.print_special_tokens = params.print_special_tokens;
|
| 268 |
+
wparams.print_realtime = false;
|
| 269 |
+
wparams.print_timestamps = !params.no_timestamps;
|
| 270 |
+
wparams.translate = params.translate;
|
| 271 |
+
wparams.language = params.language.c_str();
|
| 272 |
+
wparams.n_threads = params.n_threads;
|
| 273 |
|
| 274 |
if (whisper_full(ctx, wparams, pcmf32.data(), pcmf32.size()) != 0) {
|
| 275 |
fprintf(stderr, "%s: failed to process audio\n", argv[0]);
|