Custom GroundingDINO Model

This is a custom trained GroundingDINO model for object detection and grounding, compatible with the Hugging Face Transformers library.

Model Details

  • Model Type: GroundingDINO
  • Number of Classes: 1180
  • Training Dataset: Custom dataset with 1180 object classes
  • Architecture: GroundingDINO with Swin-T backbone
  • Transformers Compatible: โœ… Yes

Usage with Transformers

from transformers import AutoModel, AutoConfig, AutoTokenizer
import torch
from PIL import Image

# Load model and config
model = AutoModel.from_pretrained("your_username/your_model_name")
config = AutoConfig.from_pretrained("your_username/your_model_name")

# Load label map
import json
with open("label_map.json", "r") as f:
    label_map = json.load(f)

# Prepare text prompt
text_prompt = ". ".join(list(label_map.values())[:100]) + "."

# Load and preprocess image
image = Image.open("your_image.jpg").convert("RGB")
# Add your image preprocessing here

# Run inference
with torch.no_grad():
    outputs = model(images=image, text_prompts=[text_prompt])
    logits = outputs.logits
    boxes = outputs.boxes

Usage with Original Implementation

from model_loader import ModelLoader, quick_inference

# Quick inference
results = quick_inference('your_image.jpg')

# Or load model manually
model = ModelLoader.load_model(
    checkpoint_path='pytorch_model.bin',
    config_path='original_config.py',
    device='cuda'
)

label_map = ModelLoader.load_label_map('label_map.json')

Model Files

  • pytorch_model.bin: Model weights (transformers format)
  • config.json: Transformers configuration
  • modeling_groundingdino.py: Custom model class
  • tokenizer_config.json: Tokenizer configuration
  • label_map.json: Class label mapping (1180 classes)
  • original_config.py: Original training configuration

Classes

This model can detect 1180 unique object classes including:

  • blue and purple polka dot block
  • blue and purple polka dot bowl
  • blue and purple polka dot container
  • blue and purple polka dot cross
  • blue and purple polka dot diamond
  • blue and purple polka dot flower
  • blue and purple polka dot frame
  • blue and purple polka dot heart
  • blue and purple polka dot hexagon
  • blue and purple polka dot l-shaped block
  • blue and purple polka dot letter a
  • blue and purple polka dot letter e
  • blue and purple polka dot letter g
  • blue and purple polka dot letter m
  • blue and purple polka dot letter r
  • blue and purple polka dot letter t
  • blue and purple polka dot letter v
  • blue and purple polka dot line
  • blue and purple polka dot pallet
  • blue and purple polka dot pan

... and 1160 more classes.

Installation

pip install transformers torch torchvision

Example Classes

The model can detect objects with various:

  • Colors: blue, red, green, yellow, purple, etc.
  • Patterns: polka dot, stripe, paisley, swirl, checkerboard
  • Shapes: block, bowl, container, cross, diamond, flower
  • Combinations: "blue and purple polka dot block", "red stripe heart"

Performance

  • Model Size: ~1.1 GB
  • Parameters: ~172M
  • Training: 12 epochs on custom dataset
  • Memory Usage: ~2-4 GB GPU memory during inference

Citation

If you use this model, please cite the original GroundingDINO paper:

@article{{liu2023grounding,
  title={{Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection}},
  author={{Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others}},
  journal={{arXiv preprint arXiv:2303.05499}},
  year={{2023}}
}}

License

This model is released under the MIT License.

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support