|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- cj-mills/hagrid-classification-512p-no-gesture-150k |
|
language: |
|
- en |
|
base_model: |
|
- google/siglip2-so400m-patch14-384 |
|
pipeline_tag: image-classification |
|
library_name: transformers |
|
tags: |
|
- Gesture |
|
- Classification |
|
- SigLIP2 |
|
- 19:Styles |
|
- Vision-Encoder |
|
--- |
|
|
|
 |
|
|
|
# **Hand-Gesture-19** |
|
|
|
> **Hand-Gesture-19** is an image classification vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for a single-label classification task. It is designed to classify hand gesture images into different categories using the **SiglipForImageClassification** architecture. |
|
|
|
```py |
|
Classification Report: |
|
precision recall f1-score support |
|
|
|
call 0.9889 0.9739 0.9813 6939 |
|
dislike 0.9892 0.9863 0.9877 7028 |
|
fist 0.9956 0.9923 0.9940 6882 |
|
four 0.9632 0.9653 0.9643 7183 |
|
like 0.9668 0.9855 0.9760 6823 |
|
mute 0.9848 0.9976 0.9912 7139 |
|
no_gesture 0.9960 0.9957 0.9958 27823 |
|
ok 0.9872 0.9831 0.9852 6924 |
|
one 0.9817 0.9854 0.9835 7062 |
|
palm 0.9793 0.9848 0.9820 7050 |
|
peace 0.9723 0.9635 0.9679 6965 |
|
peace_inverted 0.9806 0.9836 0.9821 6876 |
|
rock 0.9853 0.9865 0.9859 6883 |
|
stop 0.9614 0.9901 0.9756 6893 |
|
stop_inverted 0.9933 0.9712 0.9821 7142 |
|
three 0.9712 0.9478 0.9594 6940 |
|
three2 0.9785 0.9799 0.9792 6870 |
|
two_up 0.9848 0.9863 0.9855 7346 |
|
two_up_inverted 0.9855 0.9871 0.9863 6967 |
|
|
|
accuracy 0.9833 153735 |
|
macro avg 0.9813 0.9814 0.9813 153735 |
|
weighted avg 0.9833 0.9833 0.9833 153735 |
|
``` |
|
|
|
 |
|
|
|
|
|
The model categorizes images into nineteen hand gestures: |
|
- **Class 0:** "call" |
|
- **Class 1:** "dislike" |
|
- **Class 2:** "fist" |
|
- **Class 3:** "four" |
|
- **Class 4:** "like" |
|
- **Class 5:** "mute" |
|
- **Class 6:** "no_gesture" |
|
- **Class 7:** "ok" |
|
- **Class 8:** "one" |
|
- **Class 9:** "palm" |
|
- **Class 10:** "peace" |
|
- **Class 11:** "peace_inverted" |
|
- **Class 12:** "rock" |
|
- **Class 13:** "stop" |
|
- **Class 14:** "stop_inverted" |
|
- **Class 15:** "three" |
|
- **Class 16:** "three2" |
|
- **Class 17:** "two_up" |
|
- **Class 18:** "two_up_inverted" |
|
|
|
# **Run with Transformers🤗** |
|
|
|
```python |
|
!pip install -q transformers torch pillow gradio |
|
``` |
|
|
|
```python |
|
import gradio as gr |
|
from transformers import AutoImageProcessor |
|
from transformers import SiglipForImageClassification |
|
from transformers.image_utils import load_image |
|
from PIL import Image |
|
import torch |
|
|
|
# Load model and processor |
|
model_name = "prithivMLmods/Hand-Gesture-19" |
|
model = SiglipForImageClassification.from_pretrained(model_name) |
|
processor = AutoImageProcessor.from_pretrained(model_name) |
|
|
|
def hand_gesture_classification(image): |
|
"""Predicts the hand gesture category from an image.""" |
|
image = Image.fromarray(image).convert("RGB") |
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() |
|
|
|
labels = { |
|
"0": "call", |
|
"1": "dislike", |
|
"2": "fist", |
|
"3": "four", |
|
"4": "like", |
|
"5": "mute", |
|
"6": "no_gesture", |
|
"7": "ok", |
|
"8": "one", |
|
"9": "palm", |
|
"10": "peace", |
|
"11": "peace_inverted", |
|
"12": "rock", |
|
"13": "stop", |
|
"14": "stop_inverted", |
|
"15": "three", |
|
"16": "three2", |
|
"17": "two_up", |
|
"18": "two_up_inverted" |
|
} |
|
predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))} |
|
|
|
return predictions |
|
|
|
# Create Gradio interface |
|
iface = gr.Interface( |
|
fn=hand_gesture_classification, |
|
inputs=gr.Image(type="numpy"), |
|
outputs=gr.Label(label="Prediction Scores"), |
|
title="Hand Gesture Classification", |
|
description="Upload an image to classify the hand gesture." |
|
) |
|
|
|
# Launch the app |
|
if __name__ == "__main__": |
|
iface.launch() |
|
``` |
|
|
|
# **Intended Use:** |
|
|
|
The **Hand-Gesture-19** model is designed to classify hand gesture images into different categories. Potential use cases include: |
|
|
|
- **Human-Computer Interaction:** Enabling gesture-based controls for devices. |
|
- **Sign Language Interpretation:** Assisting in recognizing sign language gestures. |
|
- **Gaming & VR:** Enhancing immersive experiences with hand gesture recognition. |
|
- **Robotics:** Facilitating gesture-based robotic control. |
|
- **Security & Surveillance:** Identifying gestures for access control and safety monitoring. |