Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +136 -0
config.json +61 -0
model.safetensors +3 -0
preprocessor_config.json +23 -0

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+---
+language: en
+license: apache-2.0
+tags:
+- vision
+- image-classification
+- document-classification
+- knowledge-distillation
+- vit
+- rvl-cdip
+- tiny-model
+- distilled-model
+datasets:
+- rvl_cdip
+metrics:
+- accuracy
+pipeline_tag: image-classification
+widget:
+- src: https://huggingface.co/datasets/rvl_cdip/resolve/main/sample_images/letter_0.jpg
+  example_title: Letter
+- src: https://huggingface.co/datasets/rvl_cdip/resolve/main/sample_images/form_0.jpg
+  example_title: Form
+---
+# ViT-Tiny Classifier for RVL-CDIP Document Classification (Distilled)
+This model is a compressed Vision Transformer (ViT-Tiny) trained using knowledge distillation from DiT-Large on the RVL-CDIP dataset for document image classification.
+## Model Details
+- **Student Model**: ViT-Tiny (Vision Transformer)
+- **Teacher Model**: microsoft/dit-large-finetuned-rvlcdip
+- **Training Method**: Knowledge Distillation
+- **Parameters**: ~5.5M (55x smaller than teacher)
+- **Dataset**: RVL-CDIP (320k document images, 16 classes)
+- **Task**: Document Image Classification
+- **Accuracy**: To be evaluated
+- **Compression Ratio**: ~55x parameter reduction from teacher model
+## Document Classes
+The model classifies documents into 16 categories:
+1. **letter** - Personal or business correspondence
+2. **form** - Structured forms and applications
+3. **email** - Email communications
+4. **handwritten** - Handwritten documents
+5. **advertisement** - Marketing materials and ads
+6. **scientific_report** - Research reports and studies
+7. **scientific_publication** - Academic papers and journals
+8. **specification** - Technical specifications
+9. **file_folder** - File folders and organizational documents
+10. **news_article** - News articles and press releases
+11. **budget** - Financial budgets and planning documents
+12. **invoice** - Bills and invoices
+13. **presentation** - Presentation slides
+14. **questionnaire** - Surveys and questionnaires
+15. **resume** - CVs and resumes
+16. **memo** - Internal memos and notices
+## Usage
+```python
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+from PIL import Image
+# Load model
+processor = AutoImageProcessor.from_pretrained("HAMMALE/vit-tiny-classifier-rvlcdip")
+model = AutoModelForImageClassification.from_pretrained("HAMMALE/vit-tiny-classifier-rvlcdip")
+# Load and classify an image
+image = Image.open("path_to_your_document_image.jpg")
+inputs = processor(image, return_tensors="pt")
+# Get predictions
+outputs = model(**inputs)
+predicted_class_id = outputs.logits.argmax(-1).item()
+# Get class names
+class_names = [
+    "letter", "form", "email", "handwritten", "advertisement",
+    "scientific_report", "scientific_publication", "specification",
+    "file_folder", "news_article", "budget", "invoice",
+    "presentation", "questionnaire", "resume", "memo"
+]
+predicted_class = class_names[predicted_class_id]
+print("Predicted class:", predicted_class)
+```
+## Performance
+| Metric | Value |
+|--------|-------|
+| Accuracy | To be evaluated |
+| Parameters | ~5.5M |
+| Model Size | ~22 MB |
+| Input Size | 224x224 pixels |
+## Training Details
+- **Student Architecture**: Vision Transformer (ViT-Tiny)
+- **Teacher Model**: microsoft/dit-large-finetuned-rvlcdip
+- **Distillation Method**: Knowledge Distillation
+- **Input Resolution**: 224x224
+- **Preprocessing**: Standard ImageNet normalization
+- **Framework**: Transformers/PyTorch
+- **Distillation Benefits**: Maintains high accuracy with 55x fewer parameters
+## Dataset
+The RVL-CDIP (Ryerson Vision Lab Complex Document Information Processing) dataset contains:
+- 400,000 grayscale document images
+- 16 document categories
+- Images collected from truth tobacco industry documents
+- Standard train/validation/test splits
+## Citation
+```bibtex
+@misc{hammale2025vit_tiny_rvlcdip_distilled,
+  title={ViT-Tiny Classifier for RVL-CDIP Document Classification (Distilled)},
+  author={Hammale, Mourad},
+  year={2025},
+  howpublished={\url{https://huggingface.co/HAMMALE/vit-tiny-classifier-rvlcdip}},
+  note={Knowledge distilled from microsoft/dit-large-finetuned-rvlcdip}
+}
+```
+## Acknowledgments
+This model was created by HAMMALE (Mourad) through knowledge distillation from the larger DiT-Large model (microsoft/dit-large-finetuned-rvlcdip), achieving significant compression while maintaining competitive performance for document classification tasks.
+## License
+This model is released under the Apache 2.0 license.

config.json ADDED Viewed

	@@ -0,0 +1,61 @@

+{
+  "architectures": [
+    "ViTForImageClassification"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "encoder_stride": 16,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_size": 192,
+  "id2label": {
+    "0": "letter",
+    "1": "form",
+    "2": "email",
+    "3": "handwritten",
+    "4": "advertisement",
+    "5": "scientific_report",
+    "6": "scientific_publication",
+    "7": "specification",
+    "8": "file_folder",
+    "9": "news_article",
+    "10": "budget",
+    "11": "invoice",
+    "12": "presentation",
+    "13": "questionnaire",
+    "14": "resume",
+    "15": "memo"
+  },
+  "image_size": 224,
+  "initializer_range": 0.02,
+  "intermediate_size": 768,
+  "label2id": {
+    "advertisement": 4,
+    "budget": 10,
+    "email": 2,
+    "file_folder": 8,
+    "form": 1,
+    "handwritten": 3,
+    "invoice": 11,
+    "letter": 0,
+    "memo": 15,
+    "news_article": 9,
+    "presentation": 12,
+    "questionnaire": 13,
+    "resume": 14,
+    "scientific_publication": 6,
+    "scientific_report": 5,
+    "specification": 7
+  },
+  "layer_norm_eps": 1e-12,
+  "model_type": "vit",
+  "num_attention_heads": 3,
+  "num_channels": 3,
+  "num_hidden_layers": 12,
+  "patch_size": 16,
+  "pooler_act": "tanh",
+  "pooler_output_size": 192,
+  "problem_type": "single_label_classification",
+  "qkv_bias": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.52.4"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:715e45c6eac8d55c30fa550cc387e5c8508e2beda741c90cf59371d2579a55b5
+size 22132736

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "do_convert_rgb": null,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "ViTImageProcessor",
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 224,
+    "width": 224
+  }
+}