📸 MobileCLIP-B Zero-Shot Image Classifier

Hugging Face Inference Endpoint

Production-ready wrapper around Apple’s MobileCLIP-B checkpoint. Handles image → text similarity in a single fast call.

📑 Sidebar

Features
Repository layout
Quick start (local smoke-test)
Calling the deployed endpoint
How it works
Updating the label set
License

✨ Features

	This repo
Model	MobileCLIP-B (`datacompdr` checkpoint)
Branch fusion	`reparameterize_model` baked in
Mixed-precision	FP16 on GPU, FP32 on CPU
Pre-computed text feats	One-time encoding of prompts in `items.json`
Per-request work	Only image decoding → encode_image → softmax
Latency (A10G)	< 30 ms once the image arrives

📁 Repository layout

Path	Purpose
`handler.py`	HF entry-point (loads model + text cache, serves requests)
`reparam.py`	60-line stand-alone copy of Apple’s `reparameterize_model`
`requirements.txt`	Minimal dep set (`torch`, `torchvision`, `open-clip-torch`)
`items.json`	Your label set (`id`, `name`, `prompt` per line)
`README.md`	This document

🚀 Quick start (local smoke-test)

python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

python - <<'PY'
import base64, json, handler, pathlib
app = handler.EndpointHandler()

img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
print(app({"inputs": {"image": img_b64}})[:5])   # top-5 classes
PY

🌐 Calling the deployed endpoint

ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
TOKEN="hf_xxxxxxxxxxxxxxxxx"
IMG="cat.jpg"

python - <<'PY'
import base64, json, os, requests, sys
url   = os.environ["ENDPOINT"]
token = os.environ["TOKEN"]
img   = sys.argv

payload = {
    "inputs": {
        "image": base64.b64encode(open(img, "rb").read()).decode()
    }
}
resp = requests.post(
    url,
    headers={
        "Authorization": f"Bearer {token}",
        "Content-Type":  "application/json",
        "Accept":        "application/json",
    },
    json=payload,
    timeout=60,
)
print(json.dumps(resp.json()[:5], indent=2))
PY
$IMG

Response example

[
  { "id": 23, "label": "cat",         "score": 0.92 },
  { "id": 11, "label": "tiger cat",   "score": 0.05 },
  { "id": 48, "label": "siamese cat", "score": 0.02 }
]

⚙️ How it works

Startup (runs once per replica)
- Downloads / loads MobileCLIP-B (datacompdr).
- Fuses MobileOne branches via reparam.py.
- Reads items.json and encodes every prompt → [N,512] tensor.
Per request
- Decodes base-64 JPEG/PNG.
- Applies OpenCLIP preprocessing (224 × 224 center-crop + normalise).
- Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
- Returns sorted [{id, label, score}, …].

🔄 Updating the label set

Simply edit items.json, push, and redeploy.

[
  { "id": 0, "name": "cat", "prompt": "a photo of a cat" },
  { "id": 1, "name": "dog", "prompt": "a photo of a dog" }
]

No code changes are required; the handler re-encodes prompts at start-up.

⚖️ License

Weights / data — Apple AMLR (see LICENSE_weights_data)
This wrapper code — MIT

_{Maintained with ❤️ by Your-Team — Aug 2025}

Downloads last month: 180

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support