louijiec
/

envisage

Image Classification

Generated from Trainer

Model card Files Files and versions

envisage / README.md

louijiec's picture

Update README.md

30ebcbf verified 4 months ago

|

history blame contribute delete

3.05 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: google/vit-base-patch16-224-in21k
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: envisage
	results: []
	---

	# Model Card for envisage

	This is the official model card for `envisage`, a Vision Transformer (ViT) model fine-tuned for image classification.

	This model was fine-tuned from the `google/vit-base-patch16-224-in21k` base model on the `cifar10` dataset, which consists of 60,000 32x32 color images in 10 distinct classes.

	## Model Description

	- Base Model: [`google/vit-base-patch16-224-in21k`](https://huggingface.co/google/vit-base-patch16-224-in21k)
	- Dataset: [`cifar10`](https://huggingface.co/datasets/cifar10)
	- Task: Image Classification
	- Framework: PyTorch, Transformers
	- Classes (10): `airplane`, `automobile`, `bird`, `cat`, `deer`, `dog`, `frog`, `horse`, `ship`, `truck`

	## How to Use

	The easiest way to use this model for inference is with the `pipeline` API from the `transformers` library.

	First, ensure you have the necessary libraries installed:
	```bash
	pip install transformers torch pillow
	```

	Then, you can use the following Python snippet to classify an image:

	```python
	from transformers import pipeline
	from PIL import Image
	import requests

	# Load the classification pipeline with your model
	pipe = pipeline("image-classification", model="louijiec/envisage")

	# Load an image from a URL (e.g., a cat)
	url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cat-tree.jpeg"
	image = Image.open(requests.get(url, stream=True).raw)

	# Get the predictions
	predictions = pipe(image)

	print("Predictions:")
	for p in predictions:
	print(f"- {p['label']}: {p['score']:.4f}")

	# Expected output will show the model's confidence for each class,
	# with 'cat' likely having the highest score.
	```

	## Training Procedure

	The model was trained in a Google Colab environment using the `transformers` `Trainer` API.

	### Hyperparameters

	- Learning Rate: 5e-5
	- Training Epochs: 3
	- Batch Size: 16 per device
	- Gradient Accumulation Steps: 4 (Effective batch size of 64)
	- Optimizer: AdamW with a linear learning rate schedule
	- Warmup Ratio: 0.1

	### Evaluation

	The model was evaluated on the `cifar10` test split, which contains 10,000 images.

	- Final Accuracy on Test Set: [TODO: Add final accuracy from the `trainer.evaluate()` step here. For example: 0.965]

	## Intended Use & Limitations

	This model is intended for educational purposes and as a demonstration of fine-tuning a Vision Transformer on a common benchmark dataset. It performs well on images similar to those in the `cifar10` dataset (small, low-resolution images of the 10 specified classes).

	Limitations:
	- The model will likely perform poorly on images that are significantly different from the `cifar10` data (e.g., high-resolution photos, medical images, or classes not seen during training).
	- The training data may reflect biases present in the original `cifar10` dataset.