fisheye8k_Omnifact_conditional-detr-resnet-101-dc5

This model is a fine-tuned version of Omnifact/conditional-detr-resnet-101-dc5 on the Fisheye8K dataset. It is part of the Mcity Data Engine project.

This model was presented in the paper Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection.

Model description

This model is a fine-tuned object detection model specifically designed for identifying objects within fisheye camera data, particularly relevant for Intelligent Transportation Systems (ITS). It is a key artifact of the Mcity Data Engine, an open-source system that provides a complete data-based development cycle—from data acquisition to model deployment—for continuously improving machine learning models.

The Mcity Data Engine focuses on addressing the challenge of detecting rare and novel long-tail classes in large amounts of unlabeled data through an open-vocabulary data selection process. This model checkpoint demonstrates the application of this iterative improvement framework to enhance perception capabilities in complex transportation environments.

Intended uses & limitations

Intended uses

Object detection in fisheye camera imagery within Intelligent Transportation Systems (ITS).
Identifying both common and long-tail object classes such as vehicles (Bus, Bike, Car, Truck) and Vulnerable Road Users (Pedestrian).
Integration into iterative model improvement pipelines using the Mcity Data Engine framework.
Research and development in autonomous driving and roadside perception, particularly for data-centric AI approaches.

Limitations

Performance may vary on datasets significantly different from the training distribution (Fisheye8K), especially for camera types other than fisheye.
While designed for open-vocabulary data selection, the model's generalization to entirely novel or highly obscured objects may require further iterative data enrichment and fine-tuning.
Optimal performance is achieved when integrated within the continuous data improvement loop enabled by the Mcity Data Engine.

Training and evaluation data

This model was fine-tuned on the Voxel51/fisheye8k dataset. The Fisheye8K dataset is specifically curated for object detection in fisheye camera images, capturing diverse urban and suburban scenarios relevant to intelligent transportation. The data originates from vehicle fleets and roadside perception systems, providing a rich source for training robust object detection models.

Usage

You can use this model directly with the Hugging Face transformers library for object detection.

from transformers import pipeline
from PIL import Image
import requests
from io import BytesIO

# Load the object detection pipeline
model_id = "mcity-data-engine/fisheye8k_Omnifact_conditional-detr-resnet-101-dc5"
detector = pipeline("object-detection", model=model_id)

# Example image (replace with your fisheye image or a relevant ITS image)
# This example uses a generic image. For best results, use an image from the model's domain.
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/conditional_detr_image.png"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")

# Perform inference
predictions = detector(image)

# Print detected objects
for pred in predictions:
    print(f"Label: {pred['label']}, Score: {pred['score']:.2f}, Box: {pred['box']}")

# Example output format:
# [{'box': {'xmin': 10, 'ymin': 20, 'xmax': 100, 'ymax': 120}, 'score': 0.98, 'label': 'Car'}]

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 8
seed: 0
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 36
mixed_precision_training: Native AMP

Training results

| Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:|
| 1.0147 | 1.0 | 5288 | 1.5035 |
| 0.9144 | 2.0 | 10576 | 1.4618 |
| 0.8685 | 3.0 | 15864 | 1.3823 |
| 0.8375 | 4.0 | 21152 | 1.5128 |
| 0.7715 | 5.0 | 26440 | 1.5045 |
| 0.7664 | 6.0 | 31728 | 1.6914 |
| 0.7073 | 7.0 | 37016 | 1.6101 |
| 0.6966 | 8.0 | 42304 | 1.6175 |

Framework versions

Transformers 4.48.3
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

Acknowledgements

Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends.

Citation

If you use the Mcity Data Engine in your research, feel free to cite the project:

@article{bogdoll2025mcitydataengine,
  title={Mcity Data Engine},
  author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
  journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
  year={2025}
}

Downloads last month: 3

Model tree for mcity-data-engine/fisheye8k_Omnifact_conditional-detr-resnet-101-dc5

Base model

Omnifact/conditional-detr-resnet-101-dc5

Finetuned

(1)

this model

Dataset used to train mcity-data-engine/fisheye8k_Omnifact_conditional-detr-resnet-101-dc5

Evaluation results

Metadata error: specify a dataset to view leaderboard

mcity-data-engine
/

fisheye8k_Omnifact_conditional-detr-resnet-101-dc5