whitphx HF Staff commited on
Commit
f0a7df0
Β·
verified Β·
1 Parent(s): 1534cfe

Add/update the quantized ONNX model files and README.md for Transformers.js v3

Browse files

## Applied Quantizations

### βœ… Based on `model.onnx` *with* slimming

↳ βœ… `int8`: `model_int8.onnx` (added)
↳ βœ… `uint8`: `model_uint8.onnx` (added)
↳ βœ… `q4`: `model_q4.onnx` (added)
↳ βœ… `q4f16`: `model_q4f16.onnx` (added)
↳ βœ… `bnb4`: `model_bnb4.onnx` (added)

### βœ… Based on `model.onnx` *with* slimming

↳ βœ… `int8`: `model_int8.onnx` (added)
↳ βœ… `uint8`: `model_uint8.onnx` (added)
↳ βœ… `q4`: `model_q4.onnx` (added)
↳ βœ… `q4f16`: `model_q4f16.onnx` (added)
↳ βœ… `bnb4`: `model_bnb4.onnx` (added)

README.md CHANGED
@@ -48,7 +48,9 @@ console.log(output.tolist());
48
 
49
  By default, an 8-bit quantized version of the model is used, but you can choose to use the full-precision (fp32) version by specifying `{ dtype: 'fp32' }` in the `pipeline` function:
50
  ```js
51
- const extractor = await pipeline('feature-extraction', 'Xenova/gte-small', { dtype: 'fp32' });
 
 
52
  ```
53
 
54
  ---
 
48
 
49
  By default, an 8-bit quantized version of the model is used, but you can choose to use the full-precision (fp32) version by specifying `{ dtype: 'fp32' }` in the `pipeline` function:
50
  ```js
51
+ const extractor = await pipeline('feature-extraction', 'Xenova/gte-small', {
52
+ dtype: 'fp32' // Options: "fp32", "fp16", "q8", "q4"
53
+ });
54
  ```
55
 
56
  ---
onnx/model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06bf293aba7dc80ddaab6c15fc647310302504d502d504fff773a3f107116986
3
+ size 60147542
onnx/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1337f30686b7e7a410ec5b3ff2c1e814c74d0c92ef69be3512eab1e9ce545b0
3
+ size 33760831
onnx/model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21f088eba0a3a6942efbd11ac4bf6fa697c5fcbd2ea81d27764f22df6d873fe1
3
+ size 61474190
onnx/model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c55901040c7ebbc26df6933a54bb8feb79053496153c06dc1b013b0406278e0c
3
+ size 36190171
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbebccc991415aa73dec524b3dca5f8b51eaad2f23b0be374f146c739aa6f69b
3
+ size 33760859