🦒 Swin S3 Base (224) - Pascal VOC

A Swin S3 Base model fine-tuned on the Pascal VOC 2012 dataset for multi-class image classification.


🧠 Model Details

  • Architecture: Swin S3 Base (224x224 input size)
  • Pretrained on: ImageNet-1k
  • Fine-tuned on: Pascal VOC 2012
  • Framework: PyTorch (timm implementation)
  • Format: safetensors

🎯 Intended Use

  • Primary task: Image classification of natural scenes featuring objects from 20 Pascal VOC categories.
  • Users: Researchers, developers working on computer vision applications, model benchmarking.
  • Not intended for: Real-time decision making in critical applications (e.g., autonomous vehicles, medical diagnosis).

⚠️ Limitations and Ethical Considerations

  • Biases: The model inherits biases present in Pascal VOC, such as underrepresentation of certain object types, contexts, or demographics. It may perform poorly on out-of-distribution samples.
  • Ethical Use: Avoid using this model for applications that could reinforce harmful stereotypes, cause social harm, or violate privacy (e.g., surveillance).
  • Transparency: This model is shared for research and educational use and should not be deployed without thorough fairness, robustness, and security evaluations.

βš™οΈ Training Details

  • Training library: timm + PyTorch
  • Epochs: 5
  • Batch size: 16
  • Optimizer: AdamW
  • Learning rate: 5e-5
  • Scheduler: Cosine Annealing
  • Loss function: BCE
  • Hardware: 1x NVIDIA A100 on Google Colab Pro

ℹ️ Link to experiment tracking dashboard (e.g., Weights & Biases) (optional)


πŸ“Š Evaluation Results

Evaluated on Pascal VOC 2012 test set:

Metric Value
roc_auc 98.9%

Note: Evaluation performed using standard multi-class metrics. Model was not evaluated on cross-domain generalization.


πŸ“š Dataset

  • Name: Pascal VOC 2012
  • License: Creative Commons Attribution 4.0 International
  • Labels: 20 object categories (person, car, dog, etc.)
  • Split used: Training for fine-tuning, validation for evaluation

πŸ’Ύ Files in This Repository

  • model.safetensors: Model weights
  • README.md: Model card (this file)

πŸ”— Citations

@inproceedings{liu2021swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yu and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  booktitle={ICCV},
  year={2021}
}

@article{Everingham10,
  author = {Everingham, M. and Van Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
  title = {The Pascal Visual Object Classes (VOC) Challenge},
  journal = {IJCV},
  year = {2010},
  volume = {88},
  number = {2},
  pages = {303--338}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
70.4M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for fylex/swin-s3-base-pascal_test

Finetuned
(1)
this model

Dataset used to train fylex/swin-s3-base-pascal_test

Space using fylex/swin-s3-base-pascal_test 1