Add files using upload-large-folder tool
Browse files- .gitattributes +35 -35
- README.md +129 -129
- merges.txt +0 -0
- onnx/model.onnx +3 -0
- onnx/model_bnb4.onnx +3 -0
- onnx/model_fp16.onnx +3 -0
- onnx/model_q4.onnx +3 -0
- onnx/model_q4f16.onnx +3 -0
- onnx/model_quantized.onnx +3 -0
- onnx/model_uint8.onnx +3 -0
- tokenizer.json +0 -0
.gitattributes
CHANGED
@@ -1,35 +1,35 @@
|
|
1 |
-
*.7z filter=lfs diff=lfs merge=lfs -text
|
2 |
-
*.arrow filter=lfs diff=lfs merge=lfs -text
|
3 |
-
*.bin filter=lfs diff=lfs merge=lfs -text
|
4 |
-
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
5 |
-
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
6 |
-
*.ftz filter=lfs diff=lfs merge=lfs -text
|
7 |
-
*.gz filter=lfs diff=lfs merge=lfs -text
|
8 |
-
*.h5 filter=lfs diff=lfs merge=lfs -text
|
9 |
-
*.joblib filter=lfs diff=lfs merge=lfs -text
|
10 |
-
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
11 |
-
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
12 |
-
*.model filter=lfs diff=lfs merge=lfs -text
|
13 |
-
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
14 |
-
*.npy filter=lfs diff=lfs merge=lfs -text
|
15 |
-
*.npz filter=lfs diff=lfs merge=lfs -text
|
16 |
-
*.onnx filter=lfs diff=lfs merge=lfs -text
|
17 |
-
*.ot filter=lfs diff=lfs merge=lfs -text
|
18 |
-
*.parquet filter=lfs diff=lfs merge=lfs -text
|
19 |
-
*.pb filter=lfs diff=lfs merge=lfs -text
|
20 |
-
*.pickle filter=lfs diff=lfs merge=lfs -text
|
21 |
-
*.pkl filter=lfs diff=lfs merge=lfs -text
|
22 |
-
*.pt filter=lfs diff=lfs merge=lfs -text
|
23 |
-
*.pth filter=lfs diff=lfs merge=lfs -text
|
24 |
-
*.rar filter=lfs diff=lfs merge=lfs -text
|
25 |
-
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
-
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
-
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
-
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
-
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
-
*.wasm filter=lfs diff=lfs merge=lfs -text
|
32 |
-
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
-
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
-
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,130 +1,130 @@
|
|
1 |
-
---
|
2 |
-
language: en
|
3 |
-
license: mit
|
4 |
-
library_name: onnxruntime
|
5 |
-
tags:
|
6 |
-
- clip
|
7 |
-
- vision
|
8 |
-
- zero-shot-classification
|
9 |
-
- image-text-similarity
|
10 |
-
- onnx
|
11 |
-
- vit-b32
|
12 |
-
pipeline_tag: zero-shot-image-classification
|
13 |
-
widget:
|
14 |
-
- text: a cat
|
15 |
-
example_image: >-
|
16 |
-
https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cat.png
|
17 |
-
- text: a dog
|
18 |
-
example_image: >-
|
19 |
-
https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dog.png
|
20 |
-
base_model:
|
21 |
-
- openai/clip-vit-base-patch32
|
22 |
-
---
|
23 |
-
|
24 |
-
# **CLIP ViT-B/32 (ONNX)**
|
25 |
-
|
26 |
-
This repository contains the **ONNX-exported version of OpenAI’s CLIP model (ViT-B/32)**, optimized for inference using [ONNX Runtime](https://onnxruntime.ai/). It supports **fast image-text similarity and zero-shot classification** without requiring PyTorch or TensorFlow.
|
27 |
-
|
28 |
-
---
|
29 |
-
|
30 |
-
## **Model Details**
|
31 |
-
|
32 |
-
* **Base Model:** [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32)
|
33 |
-
* **Export Format:** ONNX
|
34 |
-
* **Architecture:** Vision Transformer (ViT-B/32)
|
35 |
-
* **File Size:** \~600 MB
|
36 |
-
* **Use Case:** Zero-shot classification, image-text similarity, and retrieval.
|
37 |
-
|
38 |
-
---
|
39 |
-
|
40 |
-
## **Files Included**
|
41 |
-
|
42 |
-
```
|
43 |
-
model.onnx # ONNX version of CLIP (ViT-B/32)
|
44 |
-
config.json # Model configuration
|
45 |
-
preprocessor_config.json # Preprocessing steps for the CLIPProcessor
|
46 |
-
tokenizer.json # Tokenizer vocabulary and merges
|
47 |
-
vocab.json # BPE vocabulary
|
48 |
-
merges.txt # BPE merges
|
49 |
-
special_tokens_map.json # Special tokens mapping
|
50 |
-
tokenizer_config.json # Tokenizer configuration
|
51 |
-
```
|
52 |
-
|
53 |
-
---
|
54 |
-
|
55 |
-
## **How to Use**
|
56 |
-
|
57 |
-
### **1. Install Dependencies**
|
58 |
-
|
59 |
-
```bash
|
60 |
-
pip install onnxruntime transformers huggingface_hub pillow numpy
|
61 |
-
```
|
62 |
-
|
63 |
-
---
|
64 |
-
|
65 |
-
### **2. Load the Model and Processor**
|
66 |
-
|
67 |
-
```python
|
68 |
-
from huggingface_hub import hf_hub_download
|
69 |
-
from transformers import CLIPProcessor
|
70 |
-
import onnxruntime as ort
|
71 |
-
from PIL import Image
|
72 |
-
import numpy as np
|
73 |
-
|
74 |
-
# Download ONNX model from this repo
|
75 |
-
repo_id = "sayantan47/clip-vit-b32-onnx"
|
76 |
-
onnx_path = hf_hub_download(repo_id=repo_id, filename="model.onnx")
|
77 |
-
|
78 |
-
# Load ONNX Runtime session
|
79 |
-
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
|
80 |
-
|
81 |
-
# Load CLIP Processor (tokenizer + image preprocessor)
|
82 |
-
processor = CLIPProcessor.from_pretrained(repo_id)
|
83 |
-
|
84 |
-
# Example input
|
85 |
-
image = Image.open("example.jpg")
|
86 |
-
texts = ["a dog", "a cat"]
|
87 |
-
|
88 |
-
# Preprocess
|
89 |
-
inputs = processor(text=texts, images=image, return_tensors="np", padding=True)
|
90 |
-
|
91 |
-
# Ensure correct dtype for ONNX
|
92 |
-
inputs = {k: (v.astype(np.int64) if v.dtype == np.int32 else v) for k, v in inputs.items()}
|
93 |
-
|
94 |
-
# Run inference
|
95 |
-
outputs = session.run(None, inputs)
|
96 |
-
logits_per_image = outputs[0]
|
97 |
-
probs = np.exp(logits_per_image) / np.exp(logits_per_image).sum(-1, keepdims=True)
|
98 |
-
print("Probabilities:", probs)
|
99 |
-
```
|
100 |
-
|
101 |
-
---
|
102 |
-
|
103 |
-
## **Applications**
|
104 |
-
|
105 |
-
* **Zero-Shot Classification:** Classify images by comparing them to textual descriptions.
|
106 |
-
* **Image Similarity:** Compare embeddings between two images or between images and text.
|
107 |
-
* **Search Engines:** Use as the backbone for image-text retrieval systems.
|
108 |
-
|
109 |
-
---
|
110 |
-
|
111 |
-
## **ONNX Runtime Performance**
|
112 |
-
|
113 |
-
* **CPU-only:** Works out of the box with `onnxruntime` on CPUs.
|
114 |
-
* **GPU:** To use CUDA, install `onnxruntime-gpu` and ensure you have **CUDA 12 and cuDNN 9** installed.
|
115 |
-
|
116 |
-
```bash
|
117 |
-
pip install onnxruntime-gpu
|
118 |
-
```
|
119 |
-
|
120 |
-
---
|
121 |
-
|
122 |
-
## **Export Command Used**
|
123 |
-
|
124 |
-
The model was exported using [Hugging Face Optimum](https://huggingface.co/docs/optimum/index) with:
|
125 |
-
|
126 |
-
```bash
|
127 |
-
python -m optimum.exporters.onnx --model=openai/clip-vit-base-patch32 onnx_model/
|
128 |
-
```
|
129 |
-
|
130 |
---
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
license: mit
|
4 |
+
library_name: onnxruntime
|
5 |
+
tags:
|
6 |
+
- clip
|
7 |
+
- vision
|
8 |
+
- zero-shot-classification
|
9 |
+
- image-text-similarity
|
10 |
+
- onnx
|
11 |
+
- vit-b32
|
12 |
+
pipeline_tag: zero-shot-image-classification
|
13 |
+
widget:
|
14 |
+
- text: a cat
|
15 |
+
example_image: >-
|
16 |
+
https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cat.png
|
17 |
+
- text: a dog
|
18 |
+
example_image: >-
|
19 |
+
https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dog.png
|
20 |
+
base_model:
|
21 |
+
- openai/clip-vit-base-patch32
|
22 |
+
---
|
23 |
+
|
24 |
+
# **CLIP ViT-B/32 (ONNX)**
|
25 |
+
|
26 |
+
This repository contains the **ONNX-exported version of OpenAI’s CLIP model (ViT-B/32)**, optimized for inference using [ONNX Runtime](https://onnxruntime.ai/). It supports **fast image-text similarity and zero-shot classification** without requiring PyTorch or TensorFlow.
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
## **Model Details**
|
31 |
+
|
32 |
+
* **Base Model:** [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32)
|
33 |
+
* **Export Format:** ONNX
|
34 |
+
* **Architecture:** Vision Transformer (ViT-B/32)
|
35 |
+
* **File Size:** \~600 MB
|
36 |
+
* **Use Case:** Zero-shot classification, image-text similarity, and retrieval.
|
37 |
+
|
38 |
+
---
|
39 |
+
|
40 |
+
## **Files Included**
|
41 |
+
|
42 |
+
```
|
43 |
+
model.onnx # ONNX version of CLIP (ViT-B/32)
|
44 |
+
config.json # Model configuration
|
45 |
+
preprocessor_config.json # Preprocessing steps for the CLIPProcessor
|
46 |
+
tokenizer.json # Tokenizer vocabulary and merges
|
47 |
+
vocab.json # BPE vocabulary
|
48 |
+
merges.txt # BPE merges
|
49 |
+
special_tokens_map.json # Special tokens mapping
|
50 |
+
tokenizer_config.json # Tokenizer configuration
|
51 |
+
```
|
52 |
+
|
53 |
+
---
|
54 |
+
|
55 |
+
## **How to Use**
|
56 |
+
|
57 |
+
### **1. Install Dependencies**
|
58 |
+
|
59 |
+
```bash
|
60 |
+
pip install onnxruntime transformers huggingface_hub pillow numpy
|
61 |
+
```
|
62 |
+
|
63 |
+
---
|
64 |
+
|
65 |
+
### **2. Load the Model and Processor**
|
66 |
+
|
67 |
+
```python
|
68 |
+
from huggingface_hub import hf_hub_download
|
69 |
+
from transformers import CLIPProcessor
|
70 |
+
import onnxruntime as ort
|
71 |
+
from PIL import Image
|
72 |
+
import numpy as np
|
73 |
+
|
74 |
+
# Download ONNX model from this repo
|
75 |
+
repo_id = "sayantan47/clip-vit-b32-onnx"
|
76 |
+
onnx_path = hf_hub_download(repo_id=repo_id, filename="model.onnx")
|
77 |
+
|
78 |
+
# Load ONNX Runtime session
|
79 |
+
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
|
80 |
+
|
81 |
+
# Load CLIP Processor (tokenizer + image preprocessor)
|
82 |
+
processor = CLIPProcessor.from_pretrained(repo_id)
|
83 |
+
|
84 |
+
# Example input
|
85 |
+
image = Image.open("example.jpg")
|
86 |
+
texts = ["a dog", "a cat"]
|
87 |
+
|
88 |
+
# Preprocess
|
89 |
+
inputs = processor(text=texts, images=image, return_tensors="np", padding=True)
|
90 |
+
|
91 |
+
# Ensure correct dtype for ONNX
|
92 |
+
inputs = {k: (v.astype(np.int64) if v.dtype == np.int32 else v) for k, v in inputs.items()}
|
93 |
+
|
94 |
+
# Run inference
|
95 |
+
outputs = session.run(None, inputs)
|
96 |
+
logits_per_image = outputs[0]
|
97 |
+
probs = np.exp(logits_per_image) / np.exp(logits_per_image).sum(-1, keepdims=True)
|
98 |
+
print("Probabilities:", probs)
|
99 |
+
```
|
100 |
+
|
101 |
+
---
|
102 |
+
|
103 |
+
## **Applications**
|
104 |
+
|
105 |
+
* **Zero-Shot Classification:** Classify images by comparing them to textual descriptions.
|
106 |
+
* **Image Similarity:** Compare embeddings between two images or between images and text.
|
107 |
+
* **Search Engines:** Use as the backbone for image-text retrieval systems.
|
108 |
+
|
109 |
+
---
|
110 |
+
|
111 |
+
## **ONNX Runtime Performance**
|
112 |
+
|
113 |
+
* **CPU-only:** Works out of the box with `onnxruntime` on CPUs.
|
114 |
+
* **GPU:** To use CUDA, install `onnxruntime-gpu` and ensure you have **CUDA 12 and cuDNN 9** installed.
|
115 |
+
|
116 |
+
```bash
|
117 |
+
pip install onnxruntime-gpu
|
118 |
+
```
|
119 |
+
|
120 |
+
---
|
121 |
+
|
122 |
+
## **Export Command Used**
|
123 |
+
|
124 |
+
The model was exported using [Hugging Face Optimum](https://huggingface.co/docs/optimum/index) with:
|
125 |
+
|
126 |
+
```bash
|
127 |
+
python -m optimum.exporters.onnx --model=openai/clip-vit-base-patch32 onnx_model/
|
128 |
+
```
|
129 |
+
|
130 |
---
|
merges.txt
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
onnx/model.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9e3796fadb6cb16ad79ff34c0873d29cd9ce1578ec621286c13072c6f1014346
|
3 |
+
size 605593696
|
onnx/model_bnb4.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3e70d5ff773c939a1fcfbe135a344141a8711c617af1914cee33c278649cea15
|
3 |
+
size 181695925
|
onnx/model_fp16.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b33a72860c26713ff564d36a162be4e968ee1e50b2418f449076c067735d4fab
|
3 |
+
size 303515168
|
onnx/model_q4.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1f518bedb1851294737a141e06149883cb289160760224f2da5498886e49d5cb
|
3 |
+
size 189403477
|
onnx/model_q4f16.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0fa5651801a45889d15576d445b23172f706be5b5d17f6d96a61b486cf4a5252
|
3 |
+
size 125818295
|
onnx/model_quantized.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0898a3facfdb27f0a041e57649b4989cfd094e4a0040d6ae75ed69917dfc7328
|
3 |
+
size 153695702
|
onnx/model_uint8.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4ac011172c8c022937bb83dad2e8fc207f52f19972b36e14808cc3c8042c4e60
|
3 |
+
size 152738540
|
tokenizer.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|