πŸ“Έ MobileCLIP-B Zero-Shot Image Classifier

Hugging Face Inference Endpoint

Production-ready wrapper around Apple’s MobileCLIP-B checkpoint. Handles image β†’ text similarity in a single fast call.


πŸ“‘ Sidebar


✨ Features

This repo
Model MobileCLIP-B (datacompdr checkpoint)
Branch fusion reparameterize_model baked in
Mixed-precision FP16 on GPU, FP32 on CPU
Pre-computed text feats One-time encoding of prompts in items.json
Per-request work Only image decoding β†’ encode_image β†’ softmax
Latency (A10G) < 30 ms once the image arrives

πŸ“ Repository layout

Path Purpose
handler.py HF entry-point (loads model + text cache, serves requests)
reparam.py 60-line stand-alone copy of Apple’s reparameterize_model
requirements.txt Minimal dep set (torch, torchvision, open-clip-torch)
items.json Your label set (id, name, prompt per line)
README.md This document

πŸš€ Quick start (local smoke-test)

python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

python - <<'PY'
import base64, json, handler, pathlib
app = handler.EndpointHandler()

img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
print(app({"inputs": {"image": img_b64}})[:5])   # top-5 classes
PY

🌐 Calling the deployed endpoint

ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
TOKEN="hf_xxxxxxxxxxxxxxxxx"
IMG="cat.jpg"

python - <<'PY'
import base64, json, os, requests, sys
url   = os.environ["ENDPOINT"]
token = os.environ["TOKEN"]
img   = sys.argv

payload = {
    "inputs": {
        "image": base64.b64encode(open(img, "rb").read()).decode()
    }
}
resp = requests.post(
    url,
    headers={
        "Authorization": f"Bearer {token}",
        "Content-Type":  "application/json",
        "Accept":        "application/json",
    },
    json=payload,
    timeout=60,
)
print(json.dumps(resp.json()[:5], indent=2))
PY
$IMG

Response example

[
  { "id": 23, "label": "cat",         "score": 0.92 },
  { "id": 11, "label": "tiger cat",   "score": 0.05 },
  { "id": 48, "label": "siamese cat", "score": 0.02 }
]

βš™οΈ How it works

  1. Startup (runs once per replica)

    • Downloads / loads MobileCLIP-B (datacompdr).
    • Fuses MobileOne branches via reparam.py.
    • Reads items.json and encodes every prompt β†’ [N,512] tensor.
  2. Per request

    • Decodes base-64 JPEG/PNG.
    • Applies OpenCLIP preprocessing (224 Γ— 224 center-crop + normalise).
    • Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
    • Returns sorted [{id, label, score}, …].

πŸ”„ Updating the label set

Simply edit items.json, push, and redeploy.

[
  { "id": 0, "name": "cat", "prompt": "a photo of a cat" },
  { "id": 1, "name": "dog", "prompt": "a photo of a dog" }
]

No code changes are required; the handler re-encodes prompts at start-up.


βš–οΈ License


Maintained with ❀️ by Your-Team β€” Aug 2025
Downloads last month
180
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support