license: apache-2.0
language:
- en
pipeline_tag: image-classification
library_name: transformers
tags:
- notebook
- colab
- siglip2
- image-to-text
This notebook demonstrates how to fine-tune SigLIP 2, a robust multilingual vision-language model, for single-label image classification tasks. The fine-tuning process incorporates advanced techniques such as captioning-based pretraining, self-distillation, and masked prediction, unified within a streamlined training pipeline. The workflow supports datasets in both structured and unstructured forms, making it adaptable to various domains and resource levels.
| Notebook Name | Description | Notebook Link |
|---|---|---|
| notebook-siglip2-finetune-type1 | Train/Test Splits | ⬇️Download |
| notebook-siglip2-finetune-type2 | Only Train Split | ⬇️Download |
The notebook outlines two data handling scenarios. In the first, datasets include predefined train and test splits, enabling conventional supervised learning and generalization evaluation. In the second scenario, only a training split is available; in such cases, the training set is either partially reserved for validation or reused entirely for evaluation. This flexibility supports experimentation in constrained or domain-specific settings, where standard test annotations may not exist.
last updated : jul 2025

