Update README.md
Browse files
README.md
CHANGED
|
@@ -36,4 +36,40 @@ model-index:
|
|
| 36 |
|
| 37 |
A CLIP ViT-B/32 model trained with the [IconStack dataset](https://huggingface.co/datasets/likaixin/IconStack-Captions-48M) using [OpenCLIP](https://github.com/mlfoundations/open_clip).
|
| 38 |
|
| 39 |
-
It scores 80.24% on zero-shot classification on [icon-dataset](https://huggingface.co/datasets/likaixin/ui-icon-dataset).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
A CLIP ViT-B/32 model trained with the [IconStack dataset](https://huggingface.co/datasets/likaixin/IconStack-Captions-48M) using [OpenCLIP](https://github.com/mlfoundations/open_clip).
|
| 38 |
|
| 39 |
+
It scores 80.24% on zero-shot classification on [icon-dataset](https://huggingface.co/datasets/likaixin/ui-icon-dataset).
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
## Installation
|
| 43 |
+
You need to install `open_clip` to use this model:
|
| 44 |
+
```bash
|
| 45 |
+
pip install open_clip_torch
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## Icon-to-Text Zero-Shot Classification
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
import torch
|
| 52 |
+
from PIL import Image
|
| 53 |
+
import open_clip
|
| 54 |
+
|
| 55 |
+
CLIP_TEXT_TEMPLATE = "an icon of {}"
|
| 56 |
+
ICON_CLASSES = ["add", "close", "play", ...] # Modify your class names here
|
| 57 |
+
|
| 58 |
+
model_checkpoint = "<path_to_your_local_model>"
|
| 59 |
+
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained=model_checkpoint)
|
| 60 |
+
model.eval()
|
| 61 |
+
tokenizer = open_clip.get_tokenizer('ViT-B-32')
|
| 62 |
+
|
| 63 |
+
image = preprocess(Image.open("icon.png")).unsqueeze(0)
|
| 64 |
+
text = tokenizer([CLIP_TEXT_TEMPLATE.format(cls) for cls in ICON_CLASSES])
|
| 65 |
+
|
| 66 |
+
with torch.no_grad(), torch.autocast("cuda"):
|
| 67 |
+
image_features = model.encode_image(image)
|
| 68 |
+
text_features = model.encode_text(text)
|
| 69 |
+
image_features /= image_features.norm(dim=-1, keepdim=True)
|
| 70 |
+
text_features /= text_features.norm(dim=-1, keepdim=True)
|
| 71 |
+
|
| 72 |
+
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
|
| 73 |
+
|
| 74 |
+
print("Label probs:", text_probs) # prints something like: [[1., 0., 0., ...]]
|
| 75 |
+
```
|