KitaitiMakoto commited on
Commit
1647500
·
unverified ·
1 Parent(s): 7167f69

ruby : Sync whisper.cpp and model download feature (#2617)

Browse files

* Use C++17

* Add test for Pathname of model

* Make Whisper::Context#initialize accept Pathname

* Add shorthand for pre-converted models

* Update documents

* Add headings to API section in README [skip ci]

* Remove unused function

* Don't care about no longer included file

* Cosmetic fix

* Use conditional get when get model files

bindings/ruby/.gitignore CHANGED
@@ -1,3 +1,5 @@
1
  LICENSE
2
  pkg/
3
- lib/whisper.*
 
 
 
1
  LICENSE
2
  pkg/
3
+ lib/whisper.so
4
+ lib/whisper.bundle
5
+ lib/whisper.dll
bindings/ruby/README.md CHANGED
@@ -22,7 +22,7 @@ Usage
22
  ```ruby
23
  require "whisper"
24
 
25
- whisper = Whisper::Context.new("path/to/model.bin")
26
 
27
  params = Whisper::Params.new
28
  params.language = "en"
@@ -41,21 +41,60 @@ end
41
 
42
  ### Preparing model ###
43
 
44
- Use script to download model file(s):
45
 
46
- ```bash
47
- git clone https://github.com/ggerganov/whisper.cpp.git
48
- cd whisper.cpp
49
- sh ./models/download-ggml-model.sh base.en
 
 
 
 
 
50
  ```
51
 
52
- There are some types of models. See [models][] page for details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  ### Preparing audio file ###
55
 
56
  Currently, whisper.cpp accepts only 16-bit WAV files.
57
 
58
- ### API ###
 
 
 
59
 
60
  Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:
61
 
@@ -107,10 +146,12 @@ whisper.transcribe("path/to/audio.wav", params)
107
 
108
  ```
109
 
 
 
110
  You can see model information:
111
 
112
  ```ruby
113
- whisper = Whisper::Context.new("path/to/model.bin")
114
  model = whisper.model
115
 
116
  model.n_vocab # => 51864
@@ -128,6 +169,8 @@ model.type # => "base"
128
 
129
  ```
130
 
 
 
131
  You can set log callback:
132
 
133
  ```ruby
@@ -160,6 +203,8 @@ Whisper.log_set ->(level, buffer, user_data) {
160
  Whisper::Context.new(MODEL)
161
  ```
162
 
 
 
163
  You can also call `Whisper::Context#full` and `#full_parallel` with a Ruby array as samples. Although `#transcribe` with audio file path is recommended because it extracts PCM samples in C++ and is fast, `#full` and `#full_parallel` give you flexibility.
164
 
165
  ```ruby
@@ -169,7 +214,7 @@ require "wavefile"
169
  reader = WaveFile::Reader.new("path/to/audio.wav", WaveFile::Format.new(:mono, :float, 16000))
170
  samples = reader.enum_for(:each_buffer).map(&:samples).flatten
171
 
172
- whisper = Whisper::Context.new("path/to/model.bin")
173
  whisper.full(Whisper::Params.new, samples)
174
  whisper.each_segment do |segment|
175
  puts segment.text
 
22
  ```ruby
23
  require "whisper"
24
 
25
+ whisper = Whisper::Context.new(Whisper::Model["base"])
26
 
27
  params = Whisper::Params.new
28
  params.language = "en"
 
41
 
42
  ### Preparing model ###
43
 
44
+ Some models are prepared up-front:
45
 
46
+ ```ruby
47
+ base_en = Whisper::Model["base.en"]
48
+ whisper = Whisper::Context.new(base_en)
49
+ ```
50
+
51
+ At first time you use a model, it is downloaded automatically. After that, downloaded cached file is used. To clear cache, call `#clear_cache`:
52
+
53
+ ```ruby
54
+ Whisper::Model["base"].clear_cache
55
  ```
56
 
57
+ You can see the list of prepared model names by `Whisper::Model.preconverted_model_names`:
58
+
59
+ ```ruby
60
+ puts Whisper::Model.preconverted_model_names
61
+ # tiny
62
+ # tiny.en
63
+ # tiny-q5_1
64
+ # tiny.en-q5_1
65
+ # tiny-q8_0
66
+ # base
67
+ # base.en
68
+ # base-q5_1
69
+ # base.en-q5_1
70
+ # base-q8_0
71
+ # :
72
+ # :
73
+ ```
74
+
75
+ You can also use local model files you prepared:
76
+
77
+ ```ruby
78
+ whisper = Whisper::Context.new("path/to/your/model.bin")
79
+ ```
80
+
81
+ Or, you can download model files:
82
+
83
+ ```ruby
84
+ model_uri = Whisper::Model::URI.new("http://example.net/uri/of/your/model.bin")
85
+ whisper = Whisper::Context.new(model_uri)
86
+ ```
87
+
88
+ See [models][] page for details.
89
 
90
  ### Preparing audio file ###
91
 
92
  Currently, whisper.cpp accepts only 16-bit WAV files.
93
 
94
+ API
95
+ ---
96
+
97
+ ### Segments ###
98
 
99
  Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:
100
 
 
146
 
147
  ```
148
 
149
+ ### Models ###
150
+
151
  You can see model information:
152
 
153
  ```ruby
154
+ whisper = Whisper::Context.new(Whisper::Model["base"])
155
  model = whisper.model
156
 
157
  model.n_vocab # => 51864
 
169
 
170
  ```
171
 
172
+ ### Logging ###
173
+
174
  You can set log callback:
175
 
176
  ```ruby
 
203
  Whisper::Context.new(MODEL)
204
  ```
205
 
206
+ ### Low-level API to transcribe ###
207
+
208
  You can also call `Whisper::Context#full` and `#full_parallel` with a Ruby array as samples. Although `#transcribe` with audio file path is recommended because it extracts PCM samples in C++ and is fast, `#full` and `#full_parallel` give you flexibility.
209
 
210
  ```ruby
 
214
  reader = WaveFile::Reader.new("path/to/audio.wav", WaveFile::Format.new(:mono, :float, 16000))
215
  samples = reader.enum_for(:each_buffer).map(&:samples).flatten
216
 
217
+ whisper = Whisper::Context.new(Whisper::Model["base"])
218
  whisper.full(Whisper::Params.new, samples)
219
  whisper.each_segment do |segment|
220
  puts segment.text
bindings/ruby/Rakefile CHANGED
@@ -18,19 +18,9 @@ EXTSOURCES.each do |src|
18
  end
19
 
20
  CLEAN.include SOURCES
21
- CLEAN.include FileList[
22
- "ext/*.o",
23
- "ext/*.metal",
24
- "ext/whisper.{so,bundle,dll}",
25
- "ext/depend"
26
- ]
27
 
28
- task build: FileList[
29
- "ext/Makefile",
30
- "ext/ruby_whisper.h",
31
- "ext/ruby_whisper.cpp",
32
- "whispercpp.gemspec",
33
- ]
34
 
35
  directory "pkg"
36
  CLOBBER.include "pkg"
 
18
  end
19
 
20
  CLEAN.include SOURCES
21
+ CLEAN.include FileList["ext/*.o", "ext/*.metal", "ext/whisper.{so,bundle,dll}"]
 
 
 
 
 
22
 
23
+ task build: ["ext/Makefile", "ext/ruby_whisper.h", "ext/ruby_whisper.cpp", "whispercpp.gemspec"]
 
 
 
 
 
24
 
25
  directory "pkg"
26
  CLOBBER.include "pkg"
bindings/ruby/ext/.gitignore CHANGED
@@ -2,7 +2,6 @@ Makefile
2
  whisper.so
3
  whisper.bundle
4
  whisper.dll
5
- depend
6
  scripts/get-flags.mk
7
  *.o
8
  *.c
 
2
  whisper.so
3
  whisper.bundle
4
  whisper.dll
 
5
  scripts/get-flags.mk
6
  *.o
7
  *.c
bindings/ruby/ext/extconf.rb CHANGED
@@ -1,7 +1,7 @@
1
  require 'mkmf'
2
 
3
  # need to use c++ compiler flags
4
- $CXXFLAGS << ' -std=c++11'
5
 
6
  $LDFLAGS << ' -lstdc++'
7
 
@@ -35,10 +35,10 @@ if $GGML_METAL
35
  $GGML_METAL_EMBED_LIBRARY = true
36
  end
37
 
38
- $MK_CPPFLAGS = '-Iggml/include -Iggml/src -Iinclude -Isrc -Iexamples'
39
  $MK_CFLAGS = '-std=c11 -fPIC'
40
- $MK_CXXFLAGS = '-std=c++11 -fPIC'
41
- $MK_NVCCFLAGS = '-std=c++11'
42
  $MK_LDFLAGS = ''
43
 
44
  $OBJ_GGML = []
 
1
  require 'mkmf'
2
 
3
  # need to use c++ compiler flags
4
+ $CXXFLAGS << ' -std=c++17'
5
 
6
  $LDFLAGS << ' -lstdc++'
7
 
 
35
  $GGML_METAL_EMBED_LIBRARY = true
36
  end
37
 
38
+ $MK_CPPFLAGS = '-Iggml/include -Iggml/src -Iggml/src/ggml-cpu -Iinclude -Isrc -Iexamples'
39
  $MK_CFLAGS = '-std=c11 -fPIC'
40
+ $MK_CXXFLAGS = '-std=c++17 -fPIC'
41
+ $MK_NVCCFLAGS = '-std=c++17'
42
  $MK_LDFLAGS = ''
43
 
44
  $OBJ_GGML = []
bindings/ruby/ext/ruby_whisper.cpp CHANGED
@@ -45,6 +45,7 @@ static ID id_to_enum;
45
  static ID id_length;
46
  static ID id_next;
47
  static ID id_new;
 
48
 
49
  static bool is_log_callback_finalized = false;
50
 
@@ -194,7 +195,9 @@ static VALUE ruby_whisper_params_allocate(VALUE klass) {
194
 
195
  /*
196
  * call-seq:
 
197
  * new("path/to/model.bin") -> Whisper::Context
 
198
  */
199
  static VALUE ruby_whisper_initialize(int argc, VALUE *argv, VALUE self) {
200
  ruby_whisper *rw;
@@ -204,6 +207,9 @@ static VALUE ruby_whisper_initialize(int argc, VALUE *argv, VALUE self) {
204
  rb_scan_args(argc, argv, "01", &whisper_model_file_path);
205
  Data_Get_Struct(self, ruby_whisper, rw);
206
 
 
 
 
207
  if (!rb_respond_to(whisper_model_file_path, id_to_s)) {
208
  rb_raise(rb_eRuntimeError, "Expected file path to model to initialize Whisper::Context");
209
  }
@@ -1733,6 +1739,7 @@ void Init_whisper() {
1733
  id_length = rb_intern("length");
1734
  id_next = rb_intern("next");
1735
  id_new = rb_intern("new");
 
1736
 
1737
  mWhisper = rb_define_module("Whisper");
1738
  cContext = rb_define_class_under(mWhisper, "Context", rb_cObject);
 
45
  static ID id_length;
46
  static ID id_next;
47
  static ID id_new;
48
+ static ID id_to_path;
49
 
50
  static bool is_log_callback_finalized = false;
51
 
 
195
 
196
  /*
197
  * call-seq:
198
+ * new(Whisper::Model["base.en"]) -> Whisper::Context
199
  * new("path/to/model.bin") -> Whisper::Context
200
+ * new(Whisper::Model::URI.new("https://example.net/uri/of/model.bin")) -> Whisper::Context
201
  */
202
  static VALUE ruby_whisper_initialize(int argc, VALUE *argv, VALUE self) {
203
  ruby_whisper *rw;
 
207
  rb_scan_args(argc, argv, "01", &whisper_model_file_path);
208
  Data_Get_Struct(self, ruby_whisper, rw);
209
 
210
+ if (rb_respond_to(whisper_model_file_path, id_to_path)) {
211
+ whisper_model_file_path = rb_funcall(whisper_model_file_path, id_to_path, 0);
212
+ }
213
  if (!rb_respond_to(whisper_model_file_path, id_to_s)) {
214
  rb_raise(rb_eRuntimeError, "Expected file path to model to initialize Whisper::Context");
215
  }
 
1739
  id_length = rb_intern("length");
1740
  id_next = rb_intern("next");
1741
  id_new = rb_intern("new");
1742
+ id_to_path = rb_intern("to_path");
1743
 
1744
  mWhisper = rb_define_module("Whisper");
1745
  cContext = rb_define_class_under(mWhisper, "Context", rb_cObject);
bindings/ruby/lib/whisper.rb ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ require "whisper.so"
2
+ require "whisper/model"
bindings/ruby/lib/whisper/model.rb ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ require "whisper.so"
2
+ require "uri"
3
+ require "net/http"
4
+ require "pathname"
5
+ require "io/console/size"
6
+
7
+ class Whisper::Model
8
+ class URI
9
+ def initialize(uri)
10
+ @uri = URI(uri)
11
+ end
12
+
13
+ def to_path
14
+ cache
15
+ cache_path.to_path
16
+ end
17
+
18
+ def clear_cache
19
+ path = cache_path
20
+ path.delete if path.exist?
21
+ end
22
+
23
+ private
24
+
25
+ def cache_path
26
+ base_cache_dir/@uri.host/@uri.path[1..]
27
+ end
28
+
29
+ def base_cache_dir
30
+ base = case RUBY_PLATFORM
31
+ when /mswin|mingw/
32
+ ENV.key?("LOCALAPPDATA") ? Pathname(ENV["LOCALAPPDATA"]) : Pathname(Dir.home)/"AppData/Local"
33
+ when /darwin/
34
+ Pathname(Dir.home)/"Library/Caches"
35
+ else
36
+ ENV.key?("XDG_CACHE_HOME") ? ENV["XDG_CACHE_HOME"] : Pathname(Dir.home)/".cache"
37
+ end
38
+ base/"whisper.cpp"
39
+ end
40
+
41
+ def cache
42
+ path = cache_path
43
+ headers = {}
44
+ headers["if-modified-since"] = path.mtime.httpdate if path.exist?
45
+ request @uri, headers
46
+ path
47
+ end
48
+
49
+ def request(uri, headers)
50
+ Net::HTTP.start uri.host, uri.port, use_ssl: uri.scheme == "https" do |http|
51
+ request = Net::HTTP::Get.new(uri, headers)
52
+ http.request request do |response|
53
+ case response
54
+ when Net::HTTPNotModified
55
+ # noop
56
+ when Net::HTTPOK
57
+ download response
58
+ when Net::HTTPRedirection
59
+ request URI(response["location"])
60
+ else
61
+ raise response
62
+ end
63
+ end
64
+ end
65
+ end
66
+
67
+ def download(response)
68
+ path = cache_path
69
+ path.dirname.mkpath unless path.dirname.exist?
70
+ downloading_path = Pathname("#{path}.downloading")
71
+ size = response.content_length
72
+ downloading_path.open "wb" do |file|
73
+ downloaded = 0
74
+ response.read_body do |chunk|
75
+ file << chunk
76
+ downloaded += chunk.bytesize
77
+ show_progress downloaded, size
78
+ end
79
+ end
80
+ downloading_path.rename path
81
+ end
82
+
83
+ def show_progress(current, size)
84
+ return unless size
85
+
86
+ unless @prev
87
+ @prev = Time.now
88
+ $stderr.puts "Downloading #{@uri}"
89
+ end
90
+
91
+ now = Time.now
92
+ return if now - @prev < 1 && current < size
93
+
94
+ progress_width = 20
95
+ progress = current.to_f / size
96
+ arrow_length = progress * progress_width
97
+ arrow = "=" * (arrow_length - 1) + ">" + " " * (progress_width - arrow_length)
98
+ line = "[#{arrow}] (#{format_bytesize(current)} / #{format_bytesize(size)})"
99
+ padding = ' ' * ($stderr.winsize[1] - line.size)
100
+ $stderr.print "\r#{line}#{padding}"
101
+ $stderr.puts if current >= size
102
+ @prev = now
103
+ end
104
+
105
+ def format_bytesize(bytesize)
106
+ return "0.0 B" if bytesize.zero?
107
+
108
+ units = %w[B KiB MiB GiB TiB]
109
+ exp = (Math.log(bytesize) / Math.log(1024)).to_i
110
+ format("%.1f %s", bytesize.to_f / 1024 ** exp, units[exp])
111
+ end
112
+ end
113
+
114
+ @names = {}
115
+ %w[
116
+ tiny
117
+ tiny.en
118
+ tiny-q5_1
119
+ tiny.en-q5_1
120
+ tiny-q8_0
121
+ base
122
+ base.en
123
+ base-q5_1
124
+ base.en-q5_1
125
+ base-q8_0
126
+ small
127
+ small.en
128
+ small.en-tdrz
129
+ small-q5_1
130
+ small.en-q5_1
131
+ small-q8_0
132
+ medium
133
+ medium.en
134
+ medium-q5_0
135
+ medium.en-q5_0
136
+ medium-q8_0
137
+ large-v1
138
+ large-v2
139
+ large-v2-q5_0
140
+ large-v2-8_0
141
+ large-v3
142
+ large-v3-q5_0
143
+ large-v3-turbo
144
+ large-v3-turbo-q5_0
145
+ large-v3-turbo-q8_0
146
+ ].each do |name|
147
+ @names[name] = URI.new("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-#{name}.bin")
148
+ end
149
+
150
+ class << self
151
+ def [](name)
152
+ @names[name]
153
+ end
154
+
155
+ def preconverted_model_names
156
+ @names.keys
157
+ end
158
+ end
159
+ end
bindings/ruby/tests/jfk_reader/jfk_reader.c CHANGED
@@ -60,49 +60,9 @@ static const rb_memory_view_entry_t jfk_reader_view_entry = {
60
  jfk_reader_memory_view_available_p
61
  };
62
 
63
- static VALUE
64
- read_jfk(int argc, VALUE *argv, VALUE obj)
65
- {
66
- const char *audio_path_str = StringValueCStr(argv[0]);
67
- const int n_samples = 176000;
68
-
69
- short samples[n_samples];
70
- FILE *file = fopen(audio_path_str, "rb");
71
-
72
- fseek(file, 78, SEEK_SET);
73
- fread(samples, sizeof(short), n_samples, file);
74
- fclose(file);
75
-
76
- VALUE rb_samples = rb_ary_new2(n_samples);
77
- for (int i = 0; i < n_samples; i++) {
78
- rb_ary_push(rb_samples, INT2FIX(samples[i]));
79
- }
80
-
81
- VALUE rb_data = rb_ary_new2(n_samples);
82
- for (int i = 0; i < n_samples; i++) {
83
- rb_ary_push(rb_data, DBL2NUM(samples[i]/32768.0));
84
- }
85
-
86
- float data[n_samples];
87
- for (int i = 0; i < n_samples; i++) {
88
- data[i] = samples[i]/32768.0;
89
- }
90
- void *c_data = (void *)data;
91
- VALUE rb_void = rb_enc_str_new((const char *)c_data, sizeof(data), rb_ascii8bit_encoding());
92
-
93
- VALUE rb_result = rb_ary_new3(3, rb_samples, rb_data, rb_void);
94
- return rb_result;
95
- }
96
-
97
  void Init_jfk_reader(void)
98
  {
99
  VALUE cJFKReader = rb_define_class("JFKReader", rb_cObject);
100
  rb_memory_view_register(cJFKReader, &jfk_reader_view_entry);
101
  rb_define_method(cJFKReader, "initialize", jfk_reader_initialize, 1);
102
-
103
-
104
- rb_define_global_function("read_jfk", read_jfk, -1);
105
-
106
-
107
-
108
  }
 
60
  jfk_reader_memory_view_available_p
61
  };
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  void Init_jfk_reader(void)
64
  {
65
  VALUE cJFKReader = rb_define_class("JFKReader", rb_cObject);
66
  rb_memory_view_register(cJFKReader, &jfk_reader_view_entry);
67
  rb_define_method(cJFKReader, "initialize", jfk_reader_initialize, 1);
 
 
 
 
 
 
68
  }
bindings/ruby/tests/test_model.rb CHANGED
@@ -1,4 +1,5 @@
1
  require_relative "helper"
 
2
 
3
  class TestModel < TestBase
4
  def test_model
@@ -41,4 +42,23 @@ class TestModel < TestBase
41
  assert_equal 1, model.ftype
42
  assert_equal "base", model.type
43
  end
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  end
 
1
  require_relative "helper"
2
+ require "pathname"
3
 
4
  class TestModel < TestBase
5
  def test_model
 
42
  assert_equal 1, model.ftype
43
  assert_equal "base", model.type
44
  end
45
+
46
+ def test_pathname
47
+ path = Pathname(MODEL)
48
+ whisper = Whisper::Context.new(path)
49
+ model = whisper.model
50
+
51
+ assert_equal 51864, model.n_vocab
52
+ assert_equal 1500, model.n_audio_ctx
53
+ assert_equal 512, model.n_audio_state
54
+ assert_equal 8, model.n_audio_head
55
+ assert_equal 6, model.n_audio_layer
56
+ assert_equal 448, model.n_text_ctx
57
+ assert_equal 512, model.n_text_state
58
+ assert_equal 8, model.n_text_head
59
+ assert_equal 6, model.n_text_layer
60
+ assert_equal 80, model.n_mels
61
+ assert_equal 1, model.ftype
62
+ assert_equal "base", model.type
63
+ end
64
  end