louijiec commited on
Commit
30ebcbf
·
verified ·
1 Parent(s): 2f3b145

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -32
README.md CHANGED
@@ -11,51 +11,77 @@ model-index:
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # envisage
18
 
19
- This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on an unknown dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 0.1283
22
- - Accuracy: 0.9828
23
 
24
- ## Model description
25
 
26
- More information needed
 
 
 
 
27
 
28
- ## Intended uses & limitations
29
 
30
- More information needed
31
 
32
- ## Training and evaluation data
 
 
 
33
 
34
- More information needed
35
 
36
- ## Training procedure
 
 
 
37
 
38
- ### Training hyperparameters
 
39
 
40
- The following hyperparameters were used during training:
41
- - learning_rate: 5e-05
42
- - train_batch_size: 16
43
- - eval_batch_size: 16
44
- - seed: 42
45
- - gradient_accumulation_steps: 4
46
- - total_train_batch_size: 64
47
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
- - lr_scheduler_type: linear
49
- - lr_scheduler_warmup_ratio: 0.1
50
- - num_epochs: 1
51
 
52
- ### Training results
 
53
 
 
 
 
54
 
 
 
 
55
 
56
- ### Framework versions
57
 
58
- - Transformers 4.52.4
59
- - Pytorch 2.6.0+cu124
60
- - Datasets 3.6.0
61
- - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  results: []
12
  ---
13
 
14
+ # Model Card for envisage
 
15
 
16
+ This is the official model card for `envisage`, a Vision Transformer (ViT) model fine-tuned for image classification.
17
 
18
+ This model was fine-tuned from the `google/vit-base-patch16-224-in21k` base model on the `cifar10` dataset, which consists of 60,000 32x32 color images in 10 distinct classes.
 
 
 
19
 
20
+ ## Model Description
21
 
22
+ - **Base Model:** [`google/vit-base-patch16-224-in21k`](https://huggingface.co/google/vit-base-patch16-224-in21k)
23
+ - **Dataset:** [`cifar10`](https://huggingface.co/datasets/cifar10)
24
+ - **Task:** Image Classification
25
+ - **Framework:** PyTorch, Transformers
26
+ - **Classes (10):** `airplane`, `automobile`, `bird`, `cat`, `deer`, `dog`, `frog`, `horse`, `ship`, `truck`
27
 
28
+ ## How to Use
29
 
30
+ The easiest way to use this model for inference is with the `pipeline` API from the `transformers` library.
31
 
32
+ First, ensure you have the necessary libraries installed:
33
+ ```bash
34
+ pip install transformers torch pillow
35
+ ```
36
 
37
+ Then, you can use the following Python snippet to classify an image:
38
 
39
+ ```python
40
+ from transformers import pipeline
41
+ from PIL import Image
42
+ import requests
43
 
44
+ # Load the classification pipeline with your model
45
+ pipe = pipeline("image-classification", model="louijiec/envisage")
46
 
47
+ # Load an image from a URL (e.g., a cat)
48
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cat-tree.jpeg"
49
+ image = Image.open(requests.get(url, stream=True).raw)
 
 
 
 
 
 
 
 
50
 
51
+ # Get the predictions
52
+ predictions = pipe(image)
53
 
54
+ print("Predictions:")
55
+ for p in predictions:
56
+ print(f"- {p['label']}: {p['score']:.4f}")
57
 
58
+ # Expected output will show the model's confidence for each class,
59
+ # with 'cat' likely having the highest score.
60
+ ```
61
 
62
+ ## Training Procedure
63
 
64
+ The model was trained in a Google Colab environment using the `transformers` `Trainer` API.
65
+
66
+ ### Hyperparameters
67
+
68
+ - **Learning Rate:** 5e-5
69
+ - **Training Epochs:** 3
70
+ - **Batch Size:** 16 per device
71
+ - **Gradient Accumulation Steps:** 4 (Effective batch size of 64)
72
+ - **Optimizer:** AdamW with a linear learning rate schedule
73
+ - **Warmup Ratio:** 0.1
74
+
75
+ ### Evaluation
76
+
77
+ The model was evaluated on the `cifar10` test split, which contains 10,000 images.
78
+
79
+ - **Final Accuracy on Test Set:** [TODO: Add final accuracy from the `trainer.evaluate()` step here. For example: 0.965]
80
+
81
+ ## Intended Use & Limitations
82
+
83
+ This model is intended for educational purposes and as a demonstration of fine-tuning a Vision Transformer on a common benchmark dataset. It performs well on images similar to those in the `cifar10` dataset (small, low-resolution images of the 10 specified classes).
84
+
85
+ **Limitations:**
86
+ - The model will likely perform poorly on images that are significantly different from the `cifar10` data (e.g., high-resolution photos, medical images, or classes not seen during training).
87
+ - The training data may reflect biases present in the original `cifar10` dataset.