akazen
/

paligemma-3b-vizwizv-16Q

Model card Files Files and versions

paligemma-3b-vizwizv-16Q / README.md

akazen's picture

Update README.md

04b7c4f verified 7 months ago

|

history blame contribute delete

959 Bytes

	---
	language: en
	license: apache-2.0
	tags:
	- vision
	- vqa
	- 16bit
	- quantized
	---

	# Paligemma-3b-ft-vizwizvqa-224 (16-bit Quantized)

	This is a 16-bit quantized version of google/paligemma-3b-ft-vizwizvqa-224, fine-tuned for visual question answering on the VizWiz dataset.

	## Usage

	```python
	from transformers import AutoProcessor, AutoModelForImageTextToText
	from PIL import Image

	processor = AutoProcessor.from_pretrained("akazen/paligemma-3b-ft-vizwizvqa-16bit")
	model = AutoModelForImageTextToText.from_pretrained(
	"akazen/paligemma-3b-ft-vizwizvqa-16bit",
	device_map="auto"
	)

	# Process an image
	image = Image.open("your_image.jpg").convert("RGB")
	question = "What's in this image?"
	prompt = f"<image>\nQuestion: {question}\nAnswer:"

	inputs = processor(images=image, text=prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=20)
	answer = processor.decode(outputs[0], skip_special_tokens=True)
	```