Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,7 @@ base_model:
|
|
| 4 |
---
|
| 5 |
|
| 6 |
|
| 7 |
-
# MISHANM/
|
| 8 |
|
| 9 |
The MISHANM/ibm-granite-granite-vision-3.2-2b-fp16 model is a sophisticated vision-language model designed for image-to-text generation. It leverages advanced neural architectures to transform visual inputs into coherent textual descriptions.
|
| 10 |
|
|
@@ -41,7 +41,7 @@ from PIL import Image
|
|
| 41 |
|
| 42 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 43 |
|
| 44 |
-
model_path = "MISHANM/ibm-granite-
|
| 45 |
processor = AutoProcessor.from_pretrained(model_path)
|
| 46 |
model = AutoModelForVision2Seq.from_pretrained(model_path, ignore_mismatched_sizes=True).to(device)
|
| 47 |
|
|
@@ -113,7 +113,7 @@ Users are encouraged to critically evaluate the model's outputs, especially in s
|
|
| 113 |
|
| 114 |
## Citation Information
|
| 115 |
```
|
| 116 |
-
@misc{MISHANM/ibm-granite-
|
| 117 |
author = {Mishan Maurya},
|
| 118 |
title = {Introducing Image to Text Generation model},
|
| 119 |
year = {2025},
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
|
| 7 |
+
# MISHANM/ibm-granite-vision-3.2-2b-fp16
|
| 8 |
|
| 9 |
The MISHANM/ibm-granite-granite-vision-3.2-2b-fp16 model is a sophisticated vision-language model designed for image-to-text generation. It leverages advanced neural architectures to transform visual inputs into coherent textual descriptions.
|
| 10 |
|
|
|
|
| 41 |
|
| 42 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 43 |
|
| 44 |
+
model_path = "MISHANM/ibm-granite-vision-3.2-2b-fp16"
|
| 45 |
processor = AutoProcessor.from_pretrained(model_path)
|
| 46 |
model = AutoModelForVision2Seq.from_pretrained(model_path, ignore_mismatched_sizes=True).to(device)
|
| 47 |
|
|
|
|
| 113 |
|
| 114 |
## Citation Information
|
| 115 |
```
|
| 116 |
+
@misc{MISHANM/ibm-granite-vision-3.2-2b-fp16,
|
| 117 |
author = {Mishan Maurya},
|
| 118 |
title = {Introducing Image to Text Generation model},
|
| 119 |
year = {2025},
|