πΈ MobileCLIP-B Zero-Shot Image Classifier
Hugging Face Inference Endpoint
Production-ready wrapper around Appleβs MobileCLIP-B checkpoint. Handles image β text similarity in a single fast call.
π Sidebar
- Features
- Repository layout
- Quick start (local smoke-test)
- Calling the deployed endpoint
- How it works
- Updating the label set
- License
β¨ Features
This repo | |
---|---|
Model | MobileCLIP-B (datacompdr checkpoint) |
Branch fusion | reparameterize_model baked in |
Mixed-precision | FP16 on GPU, FP32 on CPU |
Pre-computed text feats | One-time encoding of prompts in items.json |
Per-request work | Only image decoding β encode_image β softmax |
Latency (A10G) | < 30 ms once the image arrives |
π Repository layout
Path | Purpose |
---|---|
handler.py |
HF entry-point (loads model + text cache, serves requests) |
reparam.py |
60-line stand-alone copy of Appleβs reparameterize_model |
requirements.txt |
Minimal dep set (torch , torchvision , open-clip-torch ) |
items.json |
Your label set (id , name , prompt per line) |
README.md |
This document |
π Quick start (local smoke-test)
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python - <<'PY'
import base64, json, handler, pathlib
app = handler.EndpointHandler()
img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
print(app({"inputs": {"image": img_b64}})[:5]) # top-5 classes
PY
π Calling the deployed endpoint
ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
TOKEN="hf_xxxxxxxxxxxxxxxxx"
IMG="cat.jpg"
python - <<'PY'
import base64, json, os, requests, sys
url = os.environ["ENDPOINT"]
token = os.environ["TOKEN"]
img = sys.argv
payload = {
"inputs": {
"image": base64.b64encode(open(img, "rb").read()).decode()
}
}
resp = requests.post(
url,
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept": "application/json",
},
json=payload,
timeout=60,
)
print(json.dumps(resp.json()[:5], indent=2))
PY
$IMG
Response example
[
{ "id": 23, "label": "cat", "score": 0.92 },
{ "id": 11, "label": "tiger cat", "score": 0.05 },
{ "id": 48, "label": "siamese cat", "score": 0.02 }
]
βοΈ How it works
Startup (runs once per replica)
- Downloads / loads MobileCLIP-B (
datacompdr
). - Fuses MobileOne branches via
reparam.py
. - Reads
items.json
and encodes every prompt β[N,512]
tensor.
- Downloads / loads MobileCLIP-B (
Per request
- Decodes base-64 JPEG/PNG.
- Applies OpenCLIP preprocessing (224 Γ 224 center-crop + normalise).
- Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
- Returns sorted
[{id, label, score}, β¦]
.
π Updating the label set
Simply edit items.json
, push, and redeploy.
[
{ "id": 0, "name": "cat", "prompt": "a photo of a cat" },
{ "id": 1, "name": "dog", "prompt": "a photo of a dog" }
]
No code changes are required; the handler re-encodes prompts at start-up.
βοΈ License
- Weights / data β Apple AMLR (see
LICENSE_weights_data
) - This wrapper code β MIT
Maintained with β€οΈ by Your-Team β Aug 2025
- Downloads last month
- 180
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support