Spaces:
Sleeping
Sleeping
rename : ggerganov -> ggml-org (#3005)
Browse files- README.md +32 -33
- bindings/go/README.md +2 -2
- bindings/go/doc.go +1 -1
- bindings/java/README.md +3 -3
- bindings/ruby/README.md +3 -3
- examples/bench/README.md +2 -2
- examples/command/command.cpp +1 -1
- examples/livestream.sh +1 -1
- examples/twitch.sh +1 -1
- examples/whisper.nvim/whisper.nvim +2 -2
- examples/whisper.wasm/README.md +1 -1
- examples/yt-wsp.sh +5 -5
- models/README.md +3 -4
- models/convert-h5-to-ggml.py +2 -2
- src/whisper.cpp +2 -2
README.md
CHANGED
|
@@ -2,12 +2,12 @@
|
|
| 2 |
|
| 3 |

|
| 4 |
|
| 5 |
-
[](https://opensource.org/licenses/MIT)
|
| 7 |
[](https://conan.io/center/whisper-cpp)
|
| 8 |
[](https://www.npmjs.com/package/whisper.cpp/)
|
| 9 |
|
| 10 |
-
Stable: [v1.7.5](https://github.com/
|
| 11 |
|
| 12 |
High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model:
|
| 13 |
|
|
@@ -23,7 +23,7 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
|
|
| 23 |
- [Efficient GPU support for NVIDIA](#nvidia-gpu-support)
|
| 24 |
- [OpenVINO Support](#openvino-support)
|
| 25 |
- [Ascend NPU Support](#ascend-npu-support)
|
| 26 |
-
- [C-style API](https://github.com/
|
| 27 |
|
| 28 |
Supported platforms:
|
| 29 |
|
|
@@ -31,14 +31,14 @@ Supported platforms:
|
|
| 31 |
- [x] [iOS](examples/whisper.objc)
|
| 32 |
- [x] [Android](examples/whisper.android)
|
| 33 |
- [x] [Java](bindings/java/README.md)
|
| 34 |
-
- [x] Linux / [FreeBSD](https://github.com/
|
| 35 |
- [x] [WebAssembly](examples/whisper.wasm)
|
| 36 |
-
- [x] Windows ([MSVC](https://github.com/
|
| 37 |
-
- [x] [Raspberry Pi](https://github.com/
|
| 38 |
-
- [x] [Docker](https://github.com/
|
| 39 |
|
| 40 |
The entire high-level implementation of the model is contained in [whisper.h](include/whisper.h) and [whisper.cpp](src/whisper.cpp).
|
| 41 |
-
The rest of the code is part of the [`ggml`](https://github.com/
|
| 42 |
|
| 43 |
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
|
| 44 |
As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
|
|
@@ -51,14 +51,14 @@ https://user-images.githubusercontent.com/1991296/204038393-2f846eae-c255-4099-a
|
|
| 51 |
|
| 52 |
On Apple Silicon, the inference runs fully on the GPU via Metal:
|
| 53 |
|
| 54 |
-
https://github.com/
|
| 55 |
|
| 56 |
## Quick start
|
| 57 |
|
| 58 |
First clone the repository:
|
| 59 |
|
| 60 |
```bash
|
| 61 |
-
git clone https://github.com/
|
| 62 |
```
|
| 63 |
|
| 64 |
Navigate into the directory:
|
|
@@ -222,7 +222,7 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
|
|
| 222 |
The first run on a device is slow, since the ANE service compiles the Core ML model to some device-specific format.
|
| 223 |
Next runs are faster.
|
| 224 |
|
| 225 |
-
For more information about the Core ML implementation please refer to PR [#566](https://github.com/
|
| 226 |
|
| 227 |
## OpenVINO support
|
| 228 |
|
|
@@ -307,7 +307,7 @@ This can result in significant speedup in encoder performance. Here are the inst
|
|
| 307 |
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
|
| 308 |
cached for the next run.
|
| 309 |
|
| 310 |
-
For more information about the OpenVINO implementation please refer to PR [#1037](https://github.com/
|
| 311 |
|
| 312 |
## NVIDIA GPU support
|
| 313 |
|
|
@@ -385,8 +385,8 @@ Run the inference examples as usual, for example:
|
|
| 385 |
|
| 386 |
We have two Docker images available for this project:
|
| 387 |
|
| 388 |
-
1. `ghcr.io/
|
| 389 |
-
2. `ghcr.io/
|
| 390 |
|
| 391 |
### Usage
|
| 392 |
|
|
@@ -424,8 +424,8 @@ For detailed instructions on how to use Conan, please refer to the [Conan docume
|
|
| 424 |
|
| 425 |
This is a naive example of performing real-time inference on audio from your microphone.
|
| 426 |
The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
|
| 427 |
-
More info is available in [issue #10](https://github.com/
|
| 428 |
-
You will need to have [sdl2](https://wiki.libsdl.org/SDL2/Installation) installed for it to work properly.
|
| 429 |
|
| 430 |
```bash
|
| 431 |
cmake -B build -DWHISPER_SDL2=ON
|
|
@@ -513,7 +513,7 @@ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 pr
|
|
| 513 |
|
| 514 |
## Speaker segmentation via tinydiarize (experimental)
|
| 515 |
|
| 516 |
-
More information about this approach is available here: https://github.com/
|
| 517 |
|
| 518 |
Sample usage:
|
| 519 |
|
|
@@ -577,7 +577,7 @@ https://user-images.githubusercontent.com/1991296/199337538-b7b0c7a3-2753-4a88-a
|
|
| 577 |
|
| 578 |
## Video comparison of different models
|
| 579 |
|
| 580 |
-
Use the [scripts/bench-wts.sh](https://github.com/
|
| 581 |
|
| 582 |
```bash
|
| 583 |
./scripts/bench-wts.sh samples/jfk.wav
|
|
@@ -594,7 +594,7 @@ In order to have an objective comparison of the performance of the inference acr
|
|
| 594 |
use the [whisper-bench](examples/bench) tool. The tool simply runs the Encoder part of the model and prints how much time it
|
| 595 |
took to execute it. The results are summarized in the following Github issue:
|
| 596 |
|
| 597 |
-
[Benchmark results](https://github.com/
|
| 598 |
|
| 599 |
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](scripts/bench.py).
|
| 600 |
|
|
@@ -621,25 +621,24 @@ You can download the converted models using the [models/download-ggml-model.sh](
|
|
| 621 |
or manually from here:
|
| 622 |
|
| 623 |
- https://huggingface.co/ggerganov/whisper.cpp
|
| 624 |
-
- https://ggml.ggerganov.com
|
| 625 |
|
| 626 |
For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or [models/README.md](models/README.md).
|
| 627 |
|
| 628 |
-
## [Bindings](https://github.com/
|
| 629 |
|
| 630 |
-
- [x] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/
|
| 631 |
-
- [x] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/
|
| 632 |
- React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
|
| 633 |
-
- [x] Go: [bindings/go](bindings/go) | [#312](https://github.com/
|
| 634 |
- [x] Java:
|
| 635 |
- [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
|
| 636 |
-
- [x] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/
|
| 637 |
-
- [x] Objective-C / Swift: [
|
| 638 |
- [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
|
| 639 |
-
- [x] .NET: | [#422](https://github.com/
|
| 640 |
- [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
|
| 641 |
- [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
|
| 642 |
-
- [x] Python: | [#9](https://github.com/
|
| 643 |
- [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
|
| 644 |
- [AIWintermuteAI/whispercpp](https://github.com/AIWintermuteAI/whispercpp) (Updated fork of aarnphm/whispercpp)
|
| 645 |
- [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
|
|
@@ -667,7 +666,7 @@ let package = Package(
|
|
| 667 |
]),
|
| 668 |
.binaryTarget(
|
| 669 |
name: "WhisperFramework",
|
| 670 |
-
url: "https://github.com/
|
| 671 |
checksum: "c7faeb328620d6012e130f3d705c51a6ea6c995605f2df50f6e1ad68c59c6c4a"
|
| 672 |
)
|
| 673 |
]
|
|
@@ -692,13 +691,13 @@ Some of the examples are even ported to run in the browser using WebAssembly. Ch
|
|
| 692 |
| [whisper.android](examples/whisper.android) | | Android mobile application using whisper.cpp |
|
| 693 |
| [whisper.nvim](examples/whisper.nvim) | | Speech-to-text plugin for Neovim |
|
| 694 |
| [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
|
| 695 |
-
| [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/
|
| 696 |
| [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
|
| 697 |
| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
|
| 698 |
|
| 699 |
-
## [Discussions](https://github.com/
|
| 700 |
|
| 701 |
If you have any kind of feedback about this project feel free to use the Discussions section and open a new topic.
|
| 702 |
-
You can use the [Show and tell](https://github.com/
|
| 703 |
to share your own projects that use `whisper.cpp`. If you have a question, make sure to check the
|
| 704 |
-
[Frequently asked questions (#126)](https://github.com/
|
|
|
|
| 2 |
|
| 3 |

|
| 4 |
|
| 5 |
+
[](https://github.com/ggml-org/whisper.cpp/actions)
|
| 6 |
[](https://opensource.org/licenses/MIT)
|
| 7 |
[](https://conan.io/center/whisper-cpp)
|
| 8 |
[](https://www.npmjs.com/package/whisper.cpp/)
|
| 9 |
|
| 10 |
+
Stable: [v1.7.5](https://github.com/ggml-org/whisper.cpp/releases/tag/v1.7.5) / [Roadmap](https://github.com/orgs/ggml-org/projects/4/)
|
| 11 |
|
| 12 |
High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model:
|
| 13 |
|
|
|
|
| 23 |
- [Efficient GPU support for NVIDIA](#nvidia-gpu-support)
|
| 24 |
- [OpenVINO Support](#openvino-support)
|
| 25 |
- [Ascend NPU Support](#ascend-npu-support)
|
| 26 |
+
- [C-style API](https://github.com/ggml-org/whisper.cpp/blob/master/include/whisper.h)
|
| 27 |
|
| 28 |
Supported platforms:
|
| 29 |
|
|
|
|
| 31 |
- [x] [iOS](examples/whisper.objc)
|
| 32 |
- [x] [Android](examples/whisper.android)
|
| 33 |
- [x] [Java](bindings/java/README.md)
|
| 34 |
+
- [x] Linux / [FreeBSD](https://github.com/ggml-org/whisper.cpp/issues/56#issuecomment-1350920264)
|
| 35 |
- [x] [WebAssembly](examples/whisper.wasm)
|
| 36 |
+
- [x] Windows ([MSVC](https://github.com/ggml-org/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggml-org/whisper.cpp/issues/168)]
|
| 37 |
+
- [x] [Raspberry Pi](https://github.com/ggml-org/whisper.cpp/discussions/166)
|
| 38 |
+
- [x] [Docker](https://github.com/ggml-org/whisper.cpp/pkgs/container/whisper.cpp)
|
| 39 |
|
| 40 |
The entire high-level implementation of the model is contained in [whisper.h](include/whisper.h) and [whisper.cpp](src/whisper.cpp).
|
| 41 |
+
The rest of the code is part of the [`ggml`](https://github.com/ggml-org/ggml) machine learning library.
|
| 42 |
|
| 43 |
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
|
| 44 |
As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
|
|
|
|
| 51 |
|
| 52 |
On Apple Silicon, the inference runs fully on the GPU via Metal:
|
| 53 |
|
| 54 |
+
https://github.com/ggml-org/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225
|
| 55 |
|
| 56 |
## Quick start
|
| 57 |
|
| 58 |
First clone the repository:
|
| 59 |
|
| 60 |
```bash
|
| 61 |
+
git clone https://github.com/ggml-org/whisper.cpp.git
|
| 62 |
```
|
| 63 |
|
| 64 |
Navigate into the directory:
|
|
|
|
| 222 |
The first run on a device is slow, since the ANE service compiles the Core ML model to some device-specific format.
|
| 223 |
Next runs are faster.
|
| 224 |
|
| 225 |
+
For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggml-org/whisper.cpp/pull/566).
|
| 226 |
|
| 227 |
## OpenVINO support
|
| 228 |
|
|
|
|
| 307 |
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
|
| 308 |
cached for the next run.
|
| 309 |
|
| 310 |
+
For more information about the OpenVINO implementation please refer to PR [#1037](https://github.com/ggml-org/whisper.cpp/pull/1037).
|
| 311 |
|
| 312 |
## NVIDIA GPU support
|
| 313 |
|
|
|
|
| 385 |
|
| 386 |
We have two Docker images available for this project:
|
| 387 |
|
| 388 |
+
1. `ghcr.io/ggml-org/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
|
| 389 |
+
2. `ghcr.io/ggml-org/whisper.cpp:main-cuda`: Same as `main` but compiled with CUDA support. (platforms: `linux/amd64`)
|
| 390 |
|
| 391 |
### Usage
|
| 392 |
|
|
|
|
| 424 |
|
| 425 |
This is a naive example of performing real-time inference on audio from your microphone.
|
| 426 |
The [stream](examples/stream) tool samples the audio every half a second and runs the transcription continuously.
|
| 427 |
+
More info is available in [issue #10](https://github.com/ggml-org/whisper.cpp/issues/10).
|
| 428 |
+
You will need to have [sdl2](https://wiki.libsdl.org/SDL2/Installation) installed for it to work properly.
|
| 429 |
|
| 430 |
```bash
|
| 431 |
cmake -B build -DWHISPER_SDL2=ON
|
|
|
|
| 513 |
|
| 514 |
## Speaker segmentation via tinydiarize (experimental)
|
| 515 |
|
| 516 |
+
More information about this approach is available here: https://github.com/ggml-org/whisper.cpp/pull/1058
|
| 517 |
|
| 518 |
Sample usage:
|
| 519 |
|
|
|
|
| 577 |
|
| 578 |
## Video comparison of different models
|
| 579 |
|
| 580 |
+
Use the [scripts/bench-wts.sh](https://github.com/ggml-org/whisper.cpp/blob/master/scripts/bench-wts.sh) script to generate a video in the following format:
|
| 581 |
|
| 582 |
```bash
|
| 583 |
./scripts/bench-wts.sh samples/jfk.wav
|
|
|
|
| 594 |
use the [whisper-bench](examples/bench) tool. The tool simply runs the Encoder part of the model and prints how much time it
|
| 595 |
took to execute it. The results are summarized in the following Github issue:
|
| 596 |
|
| 597 |
+
[Benchmark results](https://github.com/ggml-org/whisper.cpp/issues/89)
|
| 598 |
|
| 599 |
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](scripts/bench.py).
|
| 600 |
|
|
|
|
| 621 |
or manually from here:
|
| 622 |
|
| 623 |
- https://huggingface.co/ggerganov/whisper.cpp
|
|
|
|
| 624 |
|
| 625 |
For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or [models/README.md](models/README.md).
|
| 626 |
|
| 627 |
+
## [Bindings](https://github.com/ggml-org/whisper.cpp/discussions/categories/bindings)
|
| 628 |
|
| 629 |
+
- [x] Rust: [tazz4843/whisper-rs](https://github.com/tazz4843/whisper-rs) | [#310](https://github.com/ggml-org/whisper.cpp/discussions/310)
|
| 630 |
+
- [x] JavaScript: [bindings/javascript](bindings/javascript) | [#309](https://github.com/ggml-org/whisper.cpp/discussions/309)
|
| 631 |
- React Native (iOS / Android): [whisper.rn](https://github.com/mybigday/whisper.rn)
|
| 632 |
+
- [x] Go: [bindings/go](bindings/go) | [#312](https://github.com/ggml-org/whisper.cpp/discussions/312)
|
| 633 |
- [x] Java:
|
| 634 |
- [GiviMAD/whisper-jni](https://github.com/GiviMAD/whisper-jni)
|
| 635 |
+
- [x] Ruby: [bindings/ruby](bindings/ruby) | [#507](https://github.com/ggml-org/whisper.cpp/discussions/507)
|
| 636 |
+
- [x] Objective-C / Swift: [ggml-org/whisper.spm](https://github.com/ggml-org/whisper.spm) | [#313](https://github.com/ggml-org/whisper.cpp/discussions/313)
|
| 637 |
- [exPHAT/SwiftWhisper](https://github.com/exPHAT/SwiftWhisper)
|
| 638 |
+
- [x] .NET: | [#422](https://github.com/ggml-org/whisper.cpp/discussions/422)
|
| 639 |
- [sandrohanea/whisper.net](https://github.com/sandrohanea/whisper.net)
|
| 640 |
- [NickDarvey/whisper](https://github.com/NickDarvey/whisper)
|
| 641 |
+
- [x] Python: | [#9](https://github.com/ggml-org/whisper.cpp/issues/9)
|
| 642 |
- [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
|
| 643 |
- [AIWintermuteAI/whispercpp](https://github.com/AIWintermuteAI/whispercpp) (Updated fork of aarnphm/whispercpp)
|
| 644 |
- [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
|
|
|
|
| 666 |
]),
|
| 667 |
.binaryTarget(
|
| 668 |
name: "WhisperFramework",
|
| 669 |
+
url: "https://github.com/ggml-org/whisper.cpp/releases/download/v1.7.5/whisper-v1.7.5-xcframework.zip",
|
| 670 |
checksum: "c7faeb328620d6012e130f3d705c51a6ea6c995605f2df50f6e1ad68c59c6c4a"
|
| 671 |
)
|
| 672 |
]
|
|
|
|
| 691 |
| [whisper.android](examples/whisper.android) | | Android mobile application using whisper.cpp |
|
| 692 |
| [whisper.nvim](examples/whisper.nvim) | | Speech-to-text plugin for Neovim |
|
| 693 |
| [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
|
| 694 |
+
| [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggml-org/whisper.cpp/issues/185) |
|
| 695 |
| [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
|
| 696 |
| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
|
| 697 |
|
| 698 |
+
## [Discussions](https://github.com/ggml-org/whisper.cpp/discussions)
|
| 699 |
|
| 700 |
If you have any kind of feedback about this project feel free to use the Discussions section and open a new topic.
|
| 701 |
+
You can use the [Show and tell](https://github.com/ggml-org/whisper.cpp/discussions/categories/show-and-tell) category
|
| 702 |
to share your own projects that use `whisper.cpp`. If you have a question, make sure to check the
|
| 703 |
+
[Frequently asked questions (#126)](https://github.com/ggml-org/whisper.cpp/discussions/126) discussion.
|
bindings/go/README.md
CHANGED
|
@@ -51,7 +51,7 @@ func main() {
|
|
| 51 |
In order to build, you need to have the Go compiler installed. You can get it from [here](https://golang.org/dl/). Run the tests with:
|
| 52 |
|
| 53 |
```bash
|
| 54 |
-
git clone https://github.com/
|
| 55 |
cd whisper.cpp/bindings/go
|
| 56 |
make test
|
| 57 |
```
|
|
@@ -98,7 +98,7 @@ The API Documentation:
|
|
| 98 |
|
| 99 |
Getting help:
|
| 100 |
|
| 101 |
-
* Follow the discussion for the go bindings [here](https://github.com/
|
| 102 |
|
| 103 |
## License
|
| 104 |
|
|
|
|
| 51 |
In order to build, you need to have the Go compiler installed. You can get it from [here](https://golang.org/dl/). Run the tests with:
|
| 52 |
|
| 53 |
```bash
|
| 54 |
+
git clone https://github.com/ggml-org/whisper.cpp.git
|
| 55 |
cd whisper.cpp/bindings/go
|
| 56 |
make test
|
| 57 |
```
|
|
|
|
| 98 |
|
| 99 |
Getting help:
|
| 100 |
|
| 101 |
+
* Follow the discussion for the go bindings [here](https://github.com/ggml-org/whisper.cpp/discussions/312)
|
| 102 |
|
| 103 |
## License
|
| 104 |
|
bindings/go/doc.go
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
/*
|
| 2 |
-
github.com/
|
| 3 |
provides a speech-to-text service bindings for the Go programming language.
|
| 4 |
*/
|
| 5 |
package whisper
|
|
|
|
| 1 |
/*
|
| 2 |
+
github.com/ggml-org/whisper.cpp/bindings/go
|
| 3 |
provides a speech-to-text service bindings for the Go programming language.
|
| 4 |
*/
|
| 5 |
package whisper
|
bindings/java/README.md
CHANGED
|
@@ -31,10 +31,10 @@ public class Example {
|
|
| 31 |
var whisperParams = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY);
|
| 32 |
// custom configuration if required
|
| 33 |
whisperParams.temperature_inc = 0f;
|
| 34 |
-
|
| 35 |
var samples = readAudio(); // divide each value by 32767.0f
|
| 36 |
whisper.fullTranscribe(whisperParams, samples);
|
| 37 |
-
|
| 38 |
int segmentCount = whisper.getTextSegmentCount(context);
|
| 39 |
for (int i = 0; i < segmentCount; i++) {
|
| 40 |
String text = whisper.getTextSegment(context, i);
|
|
@@ -52,7 +52,7 @@ public class Example {
|
|
| 52 |
In order to build, you need to have the JDK 8 or higher installed. Run the tests with:
|
| 53 |
|
| 54 |
```bash
|
| 55 |
-
git clone https://github.com/
|
| 56 |
cd whisper.cpp/bindings/java
|
| 57 |
|
| 58 |
./gradlew build
|
|
|
|
| 31 |
var whisperParams = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY);
|
| 32 |
// custom configuration if required
|
| 33 |
whisperParams.temperature_inc = 0f;
|
| 34 |
+
|
| 35 |
var samples = readAudio(); // divide each value by 32767.0f
|
| 36 |
whisper.fullTranscribe(whisperParams, samples);
|
| 37 |
+
|
| 38 |
int segmentCount = whisper.getTextSegmentCount(context);
|
| 39 |
for (int i = 0; i < segmentCount; i++) {
|
| 40 |
String text = whisper.getTextSegment(context, i);
|
|
|
|
| 52 |
In order to build, you need to have the JDK 8 or higher installed. Run the tests with:
|
| 53 |
|
| 54 |
```bash
|
| 55 |
+
git clone https://github.com/ggml-org/whisper.cpp.git
|
| 56 |
cd whisper.cpp/bindings/java
|
| 57 |
|
| 58 |
./gradlew build
|
bindings/ruby/README.md
CHANGED
|
@@ -228,7 +228,7 @@ The second argument `samples` may be an array, an object with `length` and `each
|
|
| 228 |
Development
|
| 229 |
-----------
|
| 230 |
|
| 231 |
-
% git clone https://github.com/
|
| 232 |
% cd whisper.cpp/bindings/ruby
|
| 233 |
% rake test
|
| 234 |
|
|
@@ -241,5 +241,5 @@ License
|
|
| 241 |
|
| 242 |
The same to [whisper.cpp][].
|
| 243 |
|
| 244 |
-
[whisper.cpp]: https://github.com/
|
| 245 |
-
[models]: https://github.com/
|
|
|
|
| 228 |
Development
|
| 229 |
-----------
|
| 230 |
|
| 231 |
+
% git clone https://github.com/ggml-org/whisper.cpp.git
|
| 232 |
% cd whisper.cpp/bindings/ruby
|
| 233 |
% rake test
|
| 234 |
|
|
|
|
| 241 |
|
| 242 |
The same to [whisper.cpp][].
|
| 243 |
|
| 244 |
+
[whisper.cpp]: https://github.com/ggml-org/whisper.cpp
|
| 245 |
+
[models]: https://github.com/ggml-org/whisper.cpp/tree/master/models
|
examples/bench/README.md
CHANGED
|
@@ -4,7 +4,7 @@ A very basic tool for benchmarking the inference performance on your device. The
|
|
| 4 |
the transformer on some random audio data and records the execution time. This way we can have an objective comparison
|
| 5 |
of the performance of the model for various setups.
|
| 6 |
|
| 7 |
-
Benchmark results are tracked in the following Github issue: https://github.com/
|
| 8 |
|
| 9 |
```bash
|
| 10 |
# run the bench too on the small.en model using 4 threads
|
|
@@ -40,7 +40,7 @@ system_info: n_threads = 4 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 1 | WA
|
|
| 40 |
|
| 41 |
If you wish, you can submit these results here:
|
| 42 |
|
| 43 |
-
https://github.com/
|
| 44 |
|
| 45 |
Please include the following information:
|
| 46 |
|
|
|
|
| 4 |
the transformer on some random audio data and records the execution time. This way we can have an objective comparison
|
| 5 |
of the performance of the model for various setups.
|
| 6 |
|
| 7 |
+
Benchmark results are tracked in the following Github issue: https://github.com/ggml-org/whisper.cpp/issues/89
|
| 8 |
|
| 9 |
```bash
|
| 10 |
# run the bench too on the small.en model using 4 threads
|
|
|
|
| 40 |
|
| 41 |
If you wish, you can submit these results here:
|
| 42 |
|
| 43 |
+
https://github.com/ggml-org/whisper.cpp/issues/89
|
| 44 |
|
| 45 |
Please include the following information:
|
| 46 |
|
examples/command/command.cpp
CHANGED
|
@@ -3,7 +3,7 @@
|
|
| 3 |
// Speak short text commands to the microphone.
|
| 4 |
// This program will detect your voice command and convert them to text.
|
| 5 |
//
|
| 6 |
-
// ref: https://github.com/
|
| 7 |
//
|
| 8 |
|
| 9 |
#include "common-sdl.h"
|
|
|
|
| 3 |
// Speak short text commands to the microphone.
|
| 4 |
// This program will detect your voice command and convert them to text.
|
| 5 |
//
|
| 6 |
+
// ref: https://github.com/ggml-org/whisper.cpp/issues/171
|
| 7 |
//
|
| 8 |
|
| 9 |
#include "common-sdl.h"
|
examples/livestream.sh
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
#
|
| 3 |
# Transcribe audio livestream by feeding ffmpeg output to whisper.cpp at regular intervals
|
| 4 |
# Idea by @semiformal-net
|
| 5 |
-
# ref: https://github.com/
|
| 6 |
#
|
| 7 |
|
| 8 |
set -eo pipefail
|
|
|
|
| 2 |
#
|
| 3 |
# Transcribe audio livestream by feeding ffmpeg output to whisper.cpp at regular intervals
|
| 4 |
# Idea by @semiformal-net
|
| 5 |
+
# ref: https://github.com/ggml-org/whisper.cpp/issues/185
|
| 6 |
#
|
| 7 |
|
| 8 |
set -eo pipefail
|
examples/twitch.sh
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
#
|
| 3 |
# Transcribe twitch.tv livestream by feeding audio input to whisper.cpp at regular intervals
|
| 4 |
# Thanks to @keyehzy
|
| 5 |
-
# ref: https://github.com/
|
| 6 |
#
|
| 7 |
# The script currently depends on the third-party tool "streamlink"
|
| 8 |
# On Mac OS, you can install it via "brew install streamlink"
|
|
|
|
| 2 |
#
|
| 3 |
# Transcribe twitch.tv livestream by feeding audio input to whisper.cpp at regular intervals
|
| 4 |
# Thanks to @keyehzy
|
| 5 |
+
# ref: https://github.com/ggml-org/whisper.cpp/issues/209
|
| 6 |
#
|
| 7 |
# The script currently depends on the third-party tool "streamlink"
|
| 8 |
# On Mac OS, you can install it via "brew install streamlink"
|
examples/whisper.nvim/whisper.nvim
CHANGED
|
@@ -5,7 +5,7 @@
|
|
| 5 |
# This simple script is called by Neovim to capture audio from the microphone and transcribe it with Whisper.
|
| 6 |
# In order for this to work, you need to clone the whisper.cpp repo and build the 'stream' tool
|
| 7 |
#
|
| 8 |
-
# git clone https://github.com/
|
| 9 |
# cd whisper.cpp
|
| 10 |
# make stream
|
| 11 |
#
|
|
@@ -31,7 +31,7 @@
|
|
| 31 |
model="base.en"
|
| 32 |
|
| 33 |
# export the path to the whisper.cpp repo in the WHISPER_CPP_HOME env variable
|
| 34 |
-
# https://github.com/
|
| 35 |
cd "${WHISPER_CPP_HOME}"
|
| 36 |
|
| 37 |
if [ ! -f ./stream ] ; then
|
|
|
|
| 5 |
# This simple script is called by Neovim to capture audio from the microphone and transcribe it with Whisper.
|
| 6 |
# In order for this to work, you need to clone the whisper.cpp repo and build the 'stream' tool
|
| 7 |
#
|
| 8 |
+
# git clone https://github.com/ggml-org/whisper.cpp
|
| 9 |
# cd whisper.cpp
|
| 10 |
# make stream
|
| 11 |
#
|
|
|
|
| 31 |
model="base.en"
|
| 32 |
|
| 33 |
# export the path to the whisper.cpp repo in the WHISPER_CPP_HOME env variable
|
| 34 |
+
# https://github.com/ggml-org/whisper.cpp
|
| 35 |
cd "${WHISPER_CPP_HOME}"
|
| 36 |
|
| 37 |
if [ ! -f ./stream ] ; then
|
examples/whisper.wasm/README.md
CHANGED
|
@@ -30,7 +30,7 @@ Link: https://ggerganov.github.io/whisper.cpp/
|
|
| 30 |
|
| 31 |
```bash (v3.1.2)
|
| 32 |
# build using Emscripten
|
| 33 |
-
git clone https://github.com/
|
| 34 |
cd whisper.cpp
|
| 35 |
mkdir build-em && cd build-em
|
| 36 |
emcmake cmake ..
|
|
|
|
| 30 |
|
| 31 |
```bash (v3.1.2)
|
| 32 |
# build using Emscripten
|
| 33 |
+
git clone https://github.com/ggml-org/whisper.cpp
|
| 34 |
cd whisper.cpp
|
| 35 |
mkdir build-em && cd build-em
|
| 36 |
emcmake cmake ..
|
examples/yt-wsp.sh
CHANGED
|
@@ -25,12 +25,12 @@
|
|
| 25 |
# SOFTWARE.
|
| 26 |
|
| 27 |
# Small shell script to more easily automatically download and transcribe live stream VODs.
|
| 28 |
-
# This uses YT-DLP, ffmpeg and the CPP version of Whisper: https://github.com/
|
| 29 |
# Use `./examples/yt-wsp.sh help` to print help info.
|
| 30 |
#
|
| 31 |
# Sample usage:
|
| 32 |
#
|
| 33 |
-
# git clone https://github.com/
|
| 34 |
# cd whisper.cpp
|
| 35 |
# make
|
| 36 |
# ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890
|
|
@@ -44,7 +44,7 @@ SCRIPT_DIR="${SCRIPT_PATH%/*}"
|
|
| 44 |
|
| 45 |
################################################################################
|
| 46 |
# Documentation on downloading models can be found in the whisper.cpp repo:
|
| 47 |
-
# https://github.com/
|
| 48 |
#
|
| 49 |
# note: unless a multilingual model is specified, WHISPER_LANG will be ignored
|
| 50 |
# and the video will be transcribed as if the audio were in the English language
|
|
@@ -103,10 +103,10 @@ check_requirements() {
|
|
| 103 |
fi;
|
| 104 |
|
| 105 |
if ! command -v "${WHISPER_EXECUTABLE}" &>/dev/null; then
|
| 106 |
-
echo "The C++ implementation of Whisper is required: https://github.com/
|
| 107 |
echo "Sample usage:";
|
| 108 |
echo "";
|
| 109 |
-
echo " git clone https://github.com/
|
| 110 |
echo " cd whisper.cpp";
|
| 111 |
echo " make";
|
| 112 |
echo " ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890";
|
|
|
|
| 25 |
# SOFTWARE.
|
| 26 |
|
| 27 |
# Small shell script to more easily automatically download and transcribe live stream VODs.
|
| 28 |
+
# This uses YT-DLP, ffmpeg and the CPP version of Whisper: https://github.com/ggml-org/whisper.cpp
|
| 29 |
# Use `./examples/yt-wsp.sh help` to print help info.
|
| 30 |
#
|
| 31 |
# Sample usage:
|
| 32 |
#
|
| 33 |
+
# git clone https://github.com/ggml-org/whisper.cpp
|
| 34 |
# cd whisper.cpp
|
| 35 |
# make
|
| 36 |
# ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890
|
|
|
|
| 44 |
|
| 45 |
################################################################################
|
| 46 |
# Documentation on downloading models can be found in the whisper.cpp repo:
|
| 47 |
+
# https://github.com/ggml-org/whisper.cpp/#usage
|
| 48 |
#
|
| 49 |
# note: unless a multilingual model is specified, WHISPER_LANG will be ignored
|
| 50 |
# and the video will be transcribed as if the audio were in the English language
|
|
|
|
| 103 |
fi;
|
| 104 |
|
| 105 |
if ! command -v "${WHISPER_EXECUTABLE}" &>/dev/null; then
|
| 106 |
+
echo "The C++ implementation of Whisper is required: https://github.com/ggml-org/whisper.cpp"
|
| 107 |
echo "Sample usage:";
|
| 108 |
echo "";
|
| 109 |
+
echo " git clone https://github.com/ggml-org/whisper.cpp";
|
| 110 |
echo " cd whisper.cpp";
|
| 111 |
echo " make";
|
| 112 |
echo " ./examples/yt-wsp.sh https://www.youtube.com/watch?v=1234567890";
|
models/README.md
CHANGED
|
@@ -24,8 +24,7 @@ You can now use it like this:
|
|
| 24 |
|
| 25 |
`ggml` models are available from the following locations:
|
| 26 |
|
| 27 |
-
- https://huggingface.co/
|
| 28 |
-
- https://ggml.ggerganov.com
|
| 29 |
|
| 30 |
### 3. Convert with [convert-pt-to-ggml.py](convert-pt-to-ggml.py)
|
| 31 |
|
|
@@ -78,7 +77,7 @@ OpenAI format. To read the HF models you can use the [convert-h5-to-ggml.py](con
|
|
| 78 |
|
| 79 |
```bash
|
| 80 |
git clone https://github.com/openai/whisper
|
| 81 |
-
git clone https://github.com/
|
| 82 |
|
| 83 |
# clone HF fine-tuned model (this is just an example)
|
| 84 |
git clone https://huggingface.co/openai/whisper-medium
|
|
@@ -96,7 +95,7 @@ Currently, the chunk-based transcription strategy is not implemented, so there c
|
|
| 96 |
```bash
|
| 97 |
# clone OpenAI whisper and whisper.cpp
|
| 98 |
git clone https://github.com/openai/whisper
|
| 99 |
-
git clone https://github.com/
|
| 100 |
|
| 101 |
# get the models
|
| 102 |
cd whisper.cpp/models
|
|
|
|
| 24 |
|
| 25 |
`ggml` models are available from the following locations:
|
| 26 |
|
| 27 |
+
- https://huggingface.co/ggml-org/whisper.cpp/tree/main
|
|
|
|
| 28 |
|
| 29 |
### 3. Convert with [convert-pt-to-ggml.py](convert-pt-to-ggml.py)
|
| 30 |
|
|
|
|
| 77 |
|
| 78 |
```bash
|
| 79 |
git clone https://github.com/openai/whisper
|
| 80 |
+
git clone https://github.com/ggml-org/whisper.cpp
|
| 81 |
|
| 82 |
# clone HF fine-tuned model (this is just an example)
|
| 83 |
git clone https://huggingface.co/openai/whisper-medium
|
|
|
|
| 95 |
```bash
|
| 96 |
# clone OpenAI whisper and whisper.cpp
|
| 97 |
git clone https://github.com/openai/whisper
|
| 98 |
+
git clone https://github.com/ggml-org/whisper.cpp
|
| 99 |
|
| 100 |
# get the models
|
| 101 |
cd whisper.cpp/models
|
models/convert-h5-to-ggml.py
CHANGED
|
@@ -3,7 +3,7 @@
|
|
| 3 |
# Usage:
|
| 4 |
#
|
| 5 |
# git clone https://github.com/openai/whisper
|
| 6 |
-
# git clone https://github.com/
|
| 7 |
# git clone https://huggingface.co/openai/whisper-medium
|
| 8 |
#
|
| 9 |
# python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
|
|
@@ -12,7 +12,7 @@
|
|
| 12 |
#
|
| 13 |
# For more info:
|
| 14 |
#
|
| 15 |
-
# https://github.com/
|
| 16 |
#
|
| 17 |
|
| 18 |
import io
|
|
|
|
| 3 |
# Usage:
|
| 4 |
#
|
| 5 |
# git clone https://github.com/openai/whisper
|
| 6 |
+
# git clone https://github.com/ggml-org/whisper.cpp
|
| 7 |
# git clone https://huggingface.co/openai/whisper-medium
|
| 8 |
#
|
| 9 |
# python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
|
|
|
|
| 12 |
#
|
| 13 |
# For more info:
|
| 14 |
#
|
| 15 |
+
# https://github.com/ggml-org/whisper.cpp/issues/157
|
| 16 |
#
|
| 17 |
|
| 18 |
import io
|
src/whisper.cpp
CHANGED
|
@@ -5529,7 +5529,7 @@ int whisper_full_with_state(
|
|
| 5529 |
|
| 5530 |
// if length of spectrogram is less than 1.0s (100 frames), then return
|
| 5531 |
// basically don't process anything that is less than 1.0s
|
| 5532 |
-
// see issue #39: https://github.com/
|
| 5533 |
if (seek_end < seek_start + 100) {
|
| 5534 |
WHISPER_LOG_WARN("%s: input is too short - %d ms < 1000 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)*10);
|
| 5535 |
return 0;
|
|
@@ -6375,7 +6375,7 @@ int whisper_full_with_state(
|
|
| 6375 |
}
|
| 6376 |
}
|
| 6377 |
|
| 6378 |
-
// ref: https://github.com/
|
| 6379 |
const bool single_timestamp_ending = tokens_cur.size() > 1 &&
|
| 6380 |
tokens_cur[tokens_cur.size() - 2].id < whisper_token_beg(ctx) &&
|
| 6381 |
tokens_cur[tokens_cur.size() - 1].id > whisper_token_beg(ctx);
|
|
|
|
| 5529 |
|
| 5530 |
// if length of spectrogram is less than 1.0s (100 frames), then return
|
| 5531 |
// basically don't process anything that is less than 1.0s
|
| 5532 |
+
// see issue #39: https://github.com/ggml-org/whisper.cpp/issues/39
|
| 5533 |
if (seek_end < seek_start + 100) {
|
| 5534 |
WHISPER_LOG_WARN("%s: input is too short - %d ms < 1000 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)*10);
|
| 5535 |
return 0;
|
|
|
|
| 6375 |
}
|
| 6376 |
}
|
| 6377 |
|
| 6378 |
+
// ref: https://github.com/ggml-org/whisper.cpp/pull/2629
|
| 6379 |
const bool single_timestamp_ending = tokens_cur.size() > 1 &&
|
| 6380 |
tokens_cur[tokens_cur.size() - 2].id < whisper_token_beg(ctx) &&
|
| 6381 |
tokens_cur[tokens_cur.size() - 1].id > whisper_token_beg(ctx);
|