Spaces:
Running
Running
readme : add tinydiarize instructions (#1058)
Browse files
README.md
CHANGED
|
@@ -115,6 +115,7 @@ options:
|
|
| 115 |
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
|
| 116 |
-su, --speed-up [false ] speed up audio by x2 (reduced accuracy)
|
| 117 |
-tr, --translate [false ] translate from source language to english
|
|
|
|
| 118 |
-di, --diarize [false ] stereo audio diarization
|
| 119 |
-nf, --no-fallback [false ] do not use temperature fallback while decoding
|
| 120 |
-otxt, --output-txt [false ] output result in a text file
|
|
@@ -493,7 +494,7 @@ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 pr
|
|
| 493 |
[00:00:10.020 --> 00:00:11.000] country.
|
| 494 |
```
|
| 495 |
|
| 496 |
-
## Word-level timestamp
|
| 497 |
|
| 498 |
The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
|
| 499 |
|
|
@@ -534,6 +535,32 @@ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 pr
|
|
| 534 |
[00:00:10.510 --> 00:00:11.000] .
|
| 535 |
```
|
| 536 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 537 |
## Karaoke-style movie generation (experimental)
|
| 538 |
|
| 539 |
The [main](examples/main) example provides support for output of karaoke-style movies, where the
|
|
|
|
| 115 |
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
|
| 116 |
-su, --speed-up [false ] speed up audio by x2 (reduced accuracy)
|
| 117 |
-tr, --translate [false ] translate from source language to english
|
| 118 |
+
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
|
| 119 |
-di, --diarize [false ] stereo audio diarization
|
| 120 |
-nf, --no-fallback [false ] do not use temperature fallback while decoding
|
| 121 |
-otxt, --output-txt [false ] output result in a text file
|
|
|
|
| 494 |
[00:00:10.020 --> 00:00:11.000] country.
|
| 495 |
```
|
| 496 |
|
| 497 |
+
## Word-level timestamp (experimental)
|
| 498 |
|
| 499 |
The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
|
| 500 |
|
|
|
|
| 535 |
[00:00:10.510 --> 00:00:11.000] .
|
| 536 |
```
|
| 537 |
|
| 538 |
+
## Speaker segmentation via tinydiarize (experimental)
|
| 539 |
+
|
| 540 |
+
More information about this approach is available here: https://github.com/ggerganov/whisper.cpp/pull/1058
|
| 541 |
+
|
| 542 |
+
Sample usage:
|
| 543 |
+
|
| 544 |
+
```py
|
| 545 |
+
# download a tinydiarize compatible model
|
| 546 |
+
./models/download-ggml-model.sh small.en-tdrz
|
| 547 |
+
|
| 548 |
+
# run as usual, adding the "-tdrz" command-line argument
|
| 549 |
+
./main -f ./samples/a13.wav -m ./models/ggml-small.en-tdrz.bin -tdrz
|
| 550 |
+
...
|
| 551 |
+
main: processing './samples/a13.wav' (480000 samples, 30.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, tdrz = 1, timestamps = 1 ...
|
| 552 |
+
...
|
| 553 |
+
[00:00:00.000 --> 00:00:03.800] Okay Houston, we've had a problem here. [SPEAKER_TURN]
|
| 554 |
+
[00:00:03.800 --> 00:00:06.200] This is Houston. Say again please. [SPEAKER_TURN]
|
| 555 |
+
[00:00:06.200 --> 00:00:08.260] Uh Houston we've had a problem.
|
| 556 |
+
[00:00:08.260 --> 00:00:11.320] We've had a main beam up on a volt. [SPEAKER_TURN]
|
| 557 |
+
[00:00:11.320 --> 00:00:13.820] Roger main beam interval. [SPEAKER_TURN]
|
| 558 |
+
[00:00:13.820 --> 00:00:15.100] Uh uh [SPEAKER_TURN]
|
| 559 |
+
[00:00:15.100 --> 00:00:18.020] So okay stand, by thirteen we're looking at it. [SPEAKER_TURN]
|
| 560 |
+
[00:00:18.020 --> 00:00:25.740] Okay uh right now uh Houston the uh voltage is uh is looking good um.
|
| 561 |
+
[00:00:27.620 --> 00:00:29.940] And we had a a pretty large bank or so.
|
| 562 |
+
```
|
| 563 |
+
|
| 564 |
## Karaoke-style movie generation (experimental)
|
| 565 |
|
| 566 |
The [main](examples/main) example provides support for output of karaoke-style movies, where the
|