sayantan47 commited on
Commit
4e34b01
·
verified ·
1 Parent(s): 1e6a083

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -1,35 +1,35 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,130 +1,130 @@
1
- ---
2
- language: en
3
- license: mit
4
- library_name: onnxruntime
5
- tags:
6
- - clip
7
- - vision
8
- - zero-shot-classification
9
- - image-text-similarity
10
- - onnx
11
- - vit-b32
12
- pipeline_tag: zero-shot-image-classification
13
- widget:
14
- - text: a cat
15
- example_image: >-
16
- https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cat.png
17
- - text: a dog
18
- example_image: >-
19
- https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dog.png
20
- base_model:
21
- - openai/clip-vit-base-patch32
22
- ---
23
-
24
- # **CLIP ViT-B/32 (ONNX)**
25
-
26
- This repository contains the **ONNX-exported version of OpenAI’s CLIP model (ViT-B/32)**, optimized for inference using [ONNX Runtime](https://onnxruntime.ai/). It supports **fast image-text similarity and zero-shot classification** without requiring PyTorch or TensorFlow.
27
-
28
- ---
29
-
30
- ## **Model Details**
31
-
32
- * **Base Model:** [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32)
33
- * **Export Format:** ONNX
34
- * **Architecture:** Vision Transformer (ViT-B/32)
35
- * **File Size:** \~600 MB
36
- * **Use Case:** Zero-shot classification, image-text similarity, and retrieval.
37
-
38
- ---
39
-
40
- ## **Files Included**
41
-
42
- ```
43
- model.onnx # ONNX version of CLIP (ViT-B/32)
44
- config.json # Model configuration
45
- preprocessor_config.json # Preprocessing steps for the CLIPProcessor
46
- tokenizer.json # Tokenizer vocabulary and merges
47
- vocab.json # BPE vocabulary
48
- merges.txt # BPE merges
49
- special_tokens_map.json # Special tokens mapping
50
- tokenizer_config.json # Tokenizer configuration
51
- ```
52
-
53
- ---
54
-
55
- ## **How to Use**
56
-
57
- ### **1. Install Dependencies**
58
-
59
- ```bash
60
- pip install onnxruntime transformers huggingface_hub pillow numpy
61
- ```
62
-
63
- ---
64
-
65
- ### **2. Load the Model and Processor**
66
-
67
- ```python
68
- from huggingface_hub import hf_hub_download
69
- from transformers import CLIPProcessor
70
- import onnxruntime as ort
71
- from PIL import Image
72
- import numpy as np
73
-
74
- # Download ONNX model from this repo
75
- repo_id = "sayantan47/clip-vit-b32-onnx"
76
- onnx_path = hf_hub_download(repo_id=repo_id, filename="model.onnx")
77
-
78
- # Load ONNX Runtime session
79
- session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
80
-
81
- # Load CLIP Processor (tokenizer + image preprocessor)
82
- processor = CLIPProcessor.from_pretrained(repo_id)
83
-
84
- # Example input
85
- image = Image.open("example.jpg")
86
- texts = ["a dog", "a cat"]
87
-
88
- # Preprocess
89
- inputs = processor(text=texts, images=image, return_tensors="np", padding=True)
90
-
91
- # Ensure correct dtype for ONNX
92
- inputs = {k: (v.astype(np.int64) if v.dtype == np.int32 else v) for k, v in inputs.items()}
93
-
94
- # Run inference
95
- outputs = session.run(None, inputs)
96
- logits_per_image = outputs[0]
97
- probs = np.exp(logits_per_image) / np.exp(logits_per_image).sum(-1, keepdims=True)
98
- print("Probabilities:", probs)
99
- ```
100
-
101
- ---
102
-
103
- ## **Applications**
104
-
105
- * **Zero-Shot Classification:** Classify images by comparing them to textual descriptions.
106
- * **Image Similarity:** Compare embeddings between two images or between images and text.
107
- * **Search Engines:** Use as the backbone for image-text retrieval systems.
108
-
109
- ---
110
-
111
- ## **ONNX Runtime Performance**
112
-
113
- * **CPU-only:** Works out of the box with `onnxruntime` on CPUs.
114
- * **GPU:** To use CUDA, install `onnxruntime-gpu` and ensure you have **CUDA 12 and cuDNN 9** installed.
115
-
116
- ```bash
117
- pip install onnxruntime-gpu
118
- ```
119
-
120
- ---
121
-
122
- ## **Export Command Used**
123
-
124
- The model was exported using [Hugging Face Optimum](https://huggingface.co/docs/optimum/index) with:
125
-
126
- ```bash
127
- python -m optimum.exporters.onnx --model=openai/clip-vit-base-patch32 onnx_model/
128
- ```
129
-
130
  ---
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ library_name: onnxruntime
5
+ tags:
6
+ - clip
7
+ - vision
8
+ - zero-shot-classification
9
+ - image-text-similarity
10
+ - onnx
11
+ - vit-b32
12
+ pipeline_tag: zero-shot-image-classification
13
+ widget:
14
+ - text: a cat
15
+ example_image: >-
16
+ https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cat.png
17
+ - text: a dog
18
+ example_image: >-
19
+ https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dog.png
20
+ base_model:
21
+ - openai/clip-vit-base-patch32
22
+ ---
23
+
24
+ # **CLIP ViT-B/32 (ONNX)**
25
+
26
+ This repository contains the **ONNX-exported version of OpenAI’s CLIP model (ViT-B/32)**, optimized for inference using [ONNX Runtime](https://onnxruntime.ai/). It supports **fast image-text similarity and zero-shot classification** without requiring PyTorch or TensorFlow.
27
+
28
+ ---
29
+
30
+ ## **Model Details**
31
+
32
+ * **Base Model:** [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32)
33
+ * **Export Format:** ONNX
34
+ * **Architecture:** Vision Transformer (ViT-B/32)
35
+ * **File Size:** \~600 MB
36
+ * **Use Case:** Zero-shot classification, image-text similarity, and retrieval.
37
+
38
+ ---
39
+
40
+ ## **Files Included**
41
+
42
+ ```
43
+ model.onnx # ONNX version of CLIP (ViT-B/32)
44
+ config.json # Model configuration
45
+ preprocessor_config.json # Preprocessing steps for the CLIPProcessor
46
+ tokenizer.json # Tokenizer vocabulary and merges
47
+ vocab.json # BPE vocabulary
48
+ merges.txt # BPE merges
49
+ special_tokens_map.json # Special tokens mapping
50
+ tokenizer_config.json # Tokenizer configuration
51
+ ```
52
+
53
+ ---
54
+
55
+ ## **How to Use**
56
+
57
+ ### **1. Install Dependencies**
58
+
59
+ ```bash
60
+ pip install onnxruntime transformers huggingface_hub pillow numpy
61
+ ```
62
+
63
+ ---
64
+
65
+ ### **2. Load the Model and Processor**
66
+
67
+ ```python
68
+ from huggingface_hub import hf_hub_download
69
+ from transformers import CLIPProcessor
70
+ import onnxruntime as ort
71
+ from PIL import Image
72
+ import numpy as np
73
+
74
+ # Download ONNX model from this repo
75
+ repo_id = "sayantan47/clip-vit-b32-onnx"
76
+ onnx_path = hf_hub_download(repo_id=repo_id, filename="model.onnx")
77
+
78
+ # Load ONNX Runtime session
79
+ session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
80
+
81
+ # Load CLIP Processor (tokenizer + image preprocessor)
82
+ processor = CLIPProcessor.from_pretrained(repo_id)
83
+
84
+ # Example input
85
+ image = Image.open("example.jpg")
86
+ texts = ["a dog", "a cat"]
87
+
88
+ # Preprocess
89
+ inputs = processor(text=texts, images=image, return_tensors="np", padding=True)
90
+
91
+ # Ensure correct dtype for ONNX
92
+ inputs = {k: (v.astype(np.int64) if v.dtype == np.int32 else v) for k, v in inputs.items()}
93
+
94
+ # Run inference
95
+ outputs = session.run(None, inputs)
96
+ logits_per_image = outputs[0]
97
+ probs = np.exp(logits_per_image) / np.exp(logits_per_image).sum(-1, keepdims=True)
98
+ print("Probabilities:", probs)
99
+ ```
100
+
101
+ ---
102
+
103
+ ## **Applications**
104
+
105
+ * **Zero-Shot Classification:** Classify images by comparing them to textual descriptions.
106
+ * **Image Similarity:** Compare embeddings between two images or between images and text.
107
+ * **Search Engines:** Use as the backbone for image-text retrieval systems.
108
+
109
+ ---
110
+
111
+ ## **ONNX Runtime Performance**
112
+
113
+ * **CPU-only:** Works out of the box with `onnxruntime` on CPUs.
114
+ * **GPU:** To use CUDA, install `onnxruntime-gpu` and ensure you have **CUDA 12 and cuDNN 9** installed.
115
+
116
+ ```bash
117
+ pip install onnxruntime-gpu
118
+ ```
119
+
120
+ ---
121
+
122
+ ## **Export Command Used**
123
+
124
+ The model was exported using [Hugging Face Optimum](https://huggingface.co/docs/optimum/index) with:
125
+
126
+ ```bash
127
+ python -m optimum.exporters.onnx --model=openai/clip-vit-base-patch32 onnx_model/
128
+ ```
129
+
130
  ---
merges.txt CHANGED
The diff for this file is too large to render. See raw diff
 
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e3796fadb6cb16ad79ff34c0873d29cd9ce1578ec621286c13072c6f1014346
3
+ size 605593696
onnx/model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e70d5ff773c939a1fcfbe135a344141a8711c617af1914cee33c278649cea15
3
+ size 181695925
onnx/model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b33a72860c26713ff564d36a162be4e968ee1e50b2418f449076c067735d4fab
3
+ size 303515168
onnx/model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f518bedb1851294737a141e06149883cb289160760224f2da5498886e49d5cb
3
+ size 189403477
onnx/model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0fa5651801a45889d15576d445b23172f706be5b5d17f6d96a61b486cf4a5252
3
+ size 125818295
onnx/model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0898a3facfdb27f0a041e57649b4989cfd094e4a0040d6ae75ed69917dfc7328
3
+ size 153695702
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ac011172c8c022937bb83dad2e8fc207f52f19972b36e14808cc3c8042c4e60
3
+ size 152738540
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff