likaixin
/

IconClip-ViT-L-14

Model card Files Files and versions

likaixin commited on May 15

Commit

fda7cb4

·

verified ·

1 Parent(s): c84e1d1

Update README.md

Files changed (1) hide show

README.md +37 -1

README.md CHANGED Viewed

@@ -36,4 +36,40 @@ model-index:
 A CLIP ViT-B/32 model trained with the [IconStack dataset](https://huggingface.co/datasets/likaixin/IconStack-Captions-48M) using [OpenCLIP](https://github.com/mlfoundations/open_clip).
-It scores 80.24% on zero-shot classification on [icon-dataset](https://huggingface.co/datasets/likaixin/ui-icon-dataset).

 A CLIP ViT-B/32 model trained with the [IconStack dataset](https://huggingface.co/datasets/likaixin/IconStack-Captions-48M) using [OpenCLIP](https://github.com/mlfoundations/open_clip).
+It scores 80.24% on zero-shot classification on [icon-dataset](https://huggingface.co/datasets/likaixin/ui-icon-dataset).
+## Installation
+You need to install `open_clip` to use this model:
+```bash
+pip install open_clip_torch
+```
+## Icon-to-Text Zero-Shot Classification
+```python
+import torch
+from PIL import Image
+import open_clip
+CLIP_TEXT_TEMPLATE = "an icon of {}"
+ICON_CLASSES = ["add", "close", "play", ...]  # Modify your class names here
+model_checkpoint = "<path_to_your_local_model>"
+model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained=model_checkpoint)
+model.eval()
+tokenizer = open_clip.get_tokenizer('ViT-B-32')
+image = preprocess(Image.open("icon.png")).unsqueeze(0)
+text = tokenizer([CLIP_TEXT_TEMPLATE.format(cls) for cls in ICON_CLASSES])
+with torch.no_grad(), torch.autocast("cuda"):
+    image_features = model.encode_image(image)
+    text_features = model.encode_text(text)
+    image_features /= image_features.norm(dim=-1, keepdim=True)
+    text_features /= text_features.norm(dim=-1, keepdim=True)
+    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
+print("Label probs:", text_probs)  # prints something like: [[1., 0., 0., ...]]
+```