fisheye8k_Omnifact_conditional-detr-resnet-101-dc5
This model is a fine-tuned version of Omnifact/conditional-detr-resnet-101-dc5 on the Fisheye8K dataset. It is part of the Mcity Data Engine project.
This model was presented in the paper Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection.
Model description
This model is a fine-tuned object detection model specifically designed for identifying objects within fisheye camera data, particularly relevant for Intelligent Transportation Systems (ITS). It is a key artifact of the Mcity Data Engine, an open-source system that provides a complete data-based development cycle—from data acquisition to model deployment—for continuously improving machine learning models.
The Mcity Data Engine focuses on addressing the challenge of detecting rare and novel long-tail classes in large amounts of unlabeled data through an open-vocabulary data selection process. This model checkpoint demonstrates the application of this iterative improvement framework to enhance perception capabilities in complex transportation environments.
Intended uses & limitations
Intended uses
- Object detection in fisheye camera imagery within Intelligent Transportation Systems (ITS).
- Identifying both common and long-tail object classes such as vehicles (Bus, Bike, Car, Truck) and Vulnerable Road Users (Pedestrian).
- Integration into iterative model improvement pipelines using the Mcity Data Engine framework.
- Research and development in autonomous driving and roadside perception, particularly for data-centric AI approaches.
Limitations
- Performance may vary on datasets significantly different from the training distribution (Fisheye8K), especially for camera types other than fisheye.
- While designed for open-vocabulary data selection, the model's generalization to entirely novel or highly obscured objects may require further iterative data enrichment and fine-tuning.
- Optimal performance is achieved when integrated within the continuous data improvement loop enabled by the Mcity Data Engine.
Training and evaluation data
This model was fine-tuned on the Voxel51/fisheye8k dataset. The Fisheye8K dataset is specifically curated for object detection in fisheye camera images, capturing diverse urban and suburban scenarios relevant to intelligent transportation. The data originates from vehicle fleets and roadside perception systems, providing a rich source for training robust object detection models.
Usage
You can use this model directly with the Hugging Face transformers library for object detection.
from transformers import pipeline
from PIL import Image
import requests
from io import BytesIO
# Load the object detection pipeline
model_id = "mcity-data-engine/fisheye8k_Omnifact_conditional-detr-resnet-101-dc5"
detector = pipeline("object-detection", model=model_id)
# Example image (replace with your fisheye image or a relevant ITS image)
# This example uses a generic image. For best results, use an image from the model's domain.
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/conditional_detr_image.png"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")
# Perform inference
predictions = detector(image)
# Print detected objects
for pred in predictions:
print(f"Label: {pred['label']}, Score: {pred['score']:.2f}, Box: {pred['box']}")
# Example output format:
# [{'box': {'xmin': 10, 'ymin': 20, 'xmax': 100, 'ymax': 120}, 'score': 0.98, 'label': 'Car'}]
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 36
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 1.0147 | 1.0 | 5288 | 1.5035 |
| 0.9144 | 2.0 | 10576 | 1.4618 |
| 0.8685 | 3.0 | 15864 | 1.3823 |
| 0.8375 | 4.0 | 21152 | 1.5128 |
| 0.7715 | 5.0 | 26440 | 1.5045 |
| 0.7664 | 6.0 | 31728 | 1.6914 |
| 0.7073 | 7.0 | 37016 | 1.6101 |
| 0.6966 | 8.0 | 42304 | 1.6175 |
Framework versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
Links
- Paper: Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection
- Project Documentation: Mcity Data Engine Docs
- GitHub Repository: mcity/mcity_data_engine
- Google Colab Demo: Mcity Data Engine Web Demo
Acknowledgements
Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends.
Citation
If you use the Mcity Data Engine in your research, feel free to cite the project:
@article{bogdoll2025mcitydataengine,
title={Mcity Data Engine},
author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
year={2025}
}
- Downloads last month
- 3
Model tree for mcity-data-engine/fisheye8k_Omnifact_conditional-detr-resnet-101-dc5
Base model
Omnifact/conditional-detr-resnet-101-dc5