File size: 4,325 Bytes
a6cfc87 e5f2796 a6cfc87 19d4bc3 a6cfc87 e8a2935 a6cfc87 e5f2796 a6cfc87 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
language: en
license: apache-2.0
tags:
- vision
- image-classification
- document-classification
- knowledge-distillation
- vit
- rvl-cdip
- tiny-model
- distilled-model
datasets:
- rvl_cdip
metrics:
- accuracy
pipeline_tag: image-classification
---
# ViT-Tiny Classifier for RVL-CDIP Document Classification (Distilled)
This model is a compressed Vision Transformer (ViT-Tiny) trained using knowledge distillation from DiT-Large on the RVL-CDIP dataset for document image classification.
This model was developed as part of a **research internship at the Laboratory of Complex Systems, Ecole Centrale Casablanca**
## Model Details
- **Student Model**: ViT-Tiny (Vision Transformer)
- **Teacher Model**: microsoft/dit-large-finetuned-rvlcdip
- **Training Method**: Knowledge Distillation
- **Parameters**: ~5.5M (55x smaller than teacher)
- **Dataset**: RVL-CDIP (320k document images, 16 classes)
- **Task**: Document Image Classification
- **Accuracy**: 0.9210
- **Compression Ratio**: ~55x parameter reduction from teacher model
## Document Classes
The model classifies documents into 16 categories:
1. **letter** - Personal or business correspondence
2. **form** - Structured forms and applications
3. **email** - Email communications
4. **handwritten** - Handwritten documents
5. **advertisement** - Marketing materials and ads
6. **scientific_report** - Research reports and studies
7. **scientific_publication** - Academic papers and journals
8. **specification** - Technical specifications
9. **file_folder** - File folders and organizational documents
10. **news_article** - News articles and press releases
11. **budget** - Financial budgets and planning documents
12. **invoice** - Bills and invoices
13. **presentation** - Presentation slides
14. **questionnaire** - Surveys and questionnaires
15. **resume** - CVs and resumes
16. **memo** - Internal memos and notices
## Usage
```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
# Load model
processor = AutoImageProcessor.from_pretrained("HAMMALE/vit-tiny-classifier-rvlcdip")
model = AutoModelForImageClassification.from_pretrained("HAMMALE/vit-tiny-classifier-rvlcdip")
# Load and classify an image
image = Image.open("path_to_your_document_image.jpg")
inputs = processor(image, return_tensors="pt")
# Get predictions
outputs = model(**inputs)
predicted_class_id = outputs.logits.argmax(-1).item()
# Get class names
class_names = [
"letter", "form", "email", "handwritten", "advertisement",
"scientific_report", "scientific_publication", "specification",
"file_folder", "news_article", "budget", "invoice",
"presentation", "questionnaire", "resume", "memo"
]
predicted_class = class_names[predicted_class_id]
print("Predicted class:", predicted_class)
```
## Performance
| Metric | Value |
|--------|-------|
| Accuracy | 0.9210 |
| Parameters | ~5.5M |
| Model Size | ~22 MB |
| Input Size | 224x224 pixels |
## Training Details
- **Student Architecture**: Vision Transformer (ViT-Tiny)
- **Teacher Model**: microsoft/dit-large-finetuned-rvlcdip
- **Distillation Method**: Knowledge Distillation
- **Input Resolution**: 224x224
- **Preprocessing**: Standard ImageNet normalization
- **Framework**: Transformers/PyTorch
- **Distillation Benefits**: Maintains high accuracy with 55x fewer parameters
## Dataset
The RVL-CDIP (Ryerson Vision Lab Complex Document Information Processing) dataset contains:
- 400,000 grayscale document images
- 16 document categories
- Images collected from truth tobacco industry documents
- Standard train/validation/test splits
## Citation
```bibtex
@misc{hammale2025vit_tiny_rvlcdip_distilled,
title={ViT-Tiny Classifier for RVL-CDIP Document Classification (Distilled)},
author={Hammale, Mourad},
year={2025},
howpublished={\url{https://huggingface.co/HAMMALE/vit-tiny-classifier-rvlcdip}},
note={Knowledge distilled from microsoft/dit-large-finetuned-rvlcdip}
}
```
## Acknowledgments
This model was created by HAMMALE (Mourad) through knowledge distillation from the larger DiT-Large model (microsoft/dit-large-finetuned-rvlcdip), achieving significant compression while maintaining competitive performance for document classification tasks.
## License
This model is released under the Apache 2.0 license. |