Automatic Speech Recognition
Malayalam
ctranslate2
audio
vegam
kurianbenoy commited on
Commit
4179a8d
·
1 Parent(s): 34dd9dd

update README

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md CHANGED
@@ -1,3 +1,102 @@
1
  ---
 
 
 
 
 
2
  license: mit
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ml
4
+ tags:
5
+ - audio
6
+ - automatic-speech-recognition
7
  license: mit
8
+ datasets:
9
+ - google/fleurs
10
+ - thennal/IMaSC
11
+ - mozilla-foundation/common_voice_11_0
12
+ library_name: ctranslate2
13
  ---
14
+
15
+ # vegam-whipser-medium-ml-int8_float16 (വേഗം)
16
+
17
+ > This just support int8_float16 quantization only.
18
+ > File Size: 737 M
19
+
20
+ This is a conversion of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format.
21
+
22
+ This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper).
23
+
24
+ ## Installation
25
+
26
+ - Install [faster-whisper](https://github.com/guillaumekln/faster-whisper). More details about installation can be [found here in faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master#installation).
27
+
28
+ ```
29
+ pip install faster-whisper
30
+ ```
31
+
32
+ - Install [git-lfs](https://git-lfs.com/) for using this project. Note that git-lfs is just for downloading model from hugging-face.
33
+
34
+ ```
35
+ apt-get install git-lfs
36
+ ```
37
+
38
+ - Download the model weights
39
+
40
+ ```
41
+ git lfs install
42
+ git clone https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml-fp16
43
+ ```
44
+
45
+ ## Usage
46
+
47
+ ```
48
+ from faster_whisper import WhisperModel
49
+
50
+ model_path = "vegam-whisper-medium-ml-fp16"
51
+
52
+ # Run on GPU with FP16
53
+ model = WhisperModel(model_path, device="cuda", compute_type="float16")
54
+
55
+ segments, info = model.transcribe("audio.mp3", beam_size=5)
56
+
57
+ print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
58
+
59
+ for segment in segments:
60
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
61
+ ```
62
+
63
+ ## Example
64
+
65
+ ```
66
+ from faster_whisper import WhisperModel
67
+
68
+ model_path = "vegam-whisper-medium-ml-fp16"
69
+
70
+ model = WhisperModel(model_path, device="cuda", compute_type="float16")
71
+
72
+
73
+ segments, info = model.transcribe("00b38e80-80b8-4f70-babf-566e848879fc.webm", beam_size=5)
74
+
75
+ print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
76
+
77
+ for segment in segments:
78
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
79
+ ```
80
+
81
+ > Detected language 'ta' with probability 0.353516
82
+
83
+ > [0.00s -> 4.74s] പാലം കടുക്കുവോളം നാരായണ പാലം കടന്നാലൊ കൂരായണ
84
+
85
+ Note: The audio file [00b38e80-80b8-4f70-babf-566e848879fc.webm](https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml/blob/main/00b38e80-80b8-4f70-babf-566e848879fc.webm) is from [Malayalam Speech Corpus](https://blog.smc.org.in/malayalam-speech-corpus/) and is stored along with model weights.
86
+ ## Conversion Details
87
+
88
+ This conversion was possible with wonderful [CTranslate2 library](https://github.com/OpenNMT/CTranslate2) leveraging the [Transformers converter for OpenAI Whisper](https://opennmt.net/CTranslate2/guides/transformers.html#whisper).The original model was converted with the following command:
89
+
90
+ ```
91
+ ct2-transformers-converter --model thennal/whisper-medium-ml --output_dir vegam-whisper-medium-ml-fp16 \
92
+ --quantization float16
93
+ ```
94
+
95
+ ## Many Thanks to
96
+
97
+ - Creators of CTranslate2 and faster-whisper
98
+ - Thennal D K
99
+ - Santhosh Thottingal
100
+
101
+
102
+