Custom GroundingDINO Model
This is a custom trained GroundingDINO model for object detection and grounding, compatible with the Hugging Face Transformers library.
Model Details
- Model Type: GroundingDINO
- Number of Classes: 1180
- Training Dataset: Custom dataset with 1180 object classes
- Architecture: GroundingDINO with Swin-T backbone
- Transformers Compatible: โ Yes
Usage with Transformers
from transformers import AutoModel, AutoConfig, AutoTokenizer
import torch
from PIL import Image
# Load model and config
model = AutoModel.from_pretrained("your_username/your_model_name")
config = AutoConfig.from_pretrained("your_username/your_model_name")
# Load label map
import json
with open("label_map.json", "r") as f:
label_map = json.load(f)
# Prepare text prompt
text_prompt = ". ".join(list(label_map.values())[:100]) + "."
# Load and preprocess image
image = Image.open("your_image.jpg").convert("RGB")
# Add your image preprocessing here
# Run inference
with torch.no_grad():
outputs = model(images=image, text_prompts=[text_prompt])
logits = outputs.logits
boxes = outputs.boxes
Usage with Original Implementation
from model_loader import ModelLoader, quick_inference
# Quick inference
results = quick_inference('your_image.jpg')
# Or load model manually
model = ModelLoader.load_model(
checkpoint_path='pytorch_model.bin',
config_path='original_config.py',
device='cuda'
)
label_map = ModelLoader.load_label_map('label_map.json')
Model Files
pytorch_model.bin: Model weights (transformers format)config.json: Transformers configurationmodeling_groundingdino.py: Custom model classtokenizer_config.json: Tokenizer configurationlabel_map.json: Class label mapping (1180 classes)original_config.py: Original training configuration
Classes
This model can detect 1180 unique object classes including:
- blue and purple polka dot block
- blue and purple polka dot bowl
- blue and purple polka dot container
- blue and purple polka dot cross
- blue and purple polka dot diamond
- blue and purple polka dot flower
- blue and purple polka dot frame
- blue and purple polka dot heart
- blue and purple polka dot hexagon
- blue and purple polka dot l-shaped block
- blue and purple polka dot letter a
- blue and purple polka dot letter e
- blue and purple polka dot letter g
- blue and purple polka dot letter m
- blue and purple polka dot letter r
- blue and purple polka dot letter t
- blue and purple polka dot letter v
- blue and purple polka dot line
- blue and purple polka dot pallet
- blue and purple polka dot pan
... and 1160 more classes.
Installation
pip install transformers torch torchvision
Example Classes
The model can detect objects with various:
- Colors: blue, red, green, yellow, purple, etc.
- Patterns: polka dot, stripe, paisley, swirl, checkerboard
- Shapes: block, bowl, container, cross, diamond, flower
- Combinations: "blue and purple polka dot block", "red stripe heart"
Performance
- Model Size: ~1.1 GB
- Parameters: ~172M
- Training: 12 epochs on custom dataset
- Memory Usage: ~2-4 GB GPU memory during inference
Citation
If you use this model, please cite the original GroundingDINO paper:
@article{{liu2023grounding,
title={{Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection}},
author={{Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others}},
journal={{arXiv preprint arXiv:2303.05499}},
year={{2023}}
}}
License
This model is released under the MIT License.
- Downloads last month
- 27