Instructions to use ibm-granite/granite-vision-3.1-2b-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ibm-granite/granite-vision-3.1-2b-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ibm-granite/granite-vision-3.1-2b-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ibm-granite/granite-vision-3.1-2b-preview")
model = AutoModelForImageTextToText.from_pretrained("ibm-granite/granite-vision-3.1-2b-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ibm-granite/granite-vision-3.1-2b-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ibm-granite/granite-vision-3.1-2b-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ibm-granite/granite-vision-3.1-2b-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ibm-granite/granite-vision-3.1-2b-preview

SGLang

How to use ibm-granite/granite-vision-3.1-2b-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ibm-granite/granite-vision-3.1-2b-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ibm-granite/granite-vision-3.1-2b-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ibm-granite/granite-vision-3.1-2b-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ibm-granite/granite-vision-3.1-2b-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ibm-granite/granite-vision-3.1-2b-preview with Docker Model Runner:
```
docker model run hf.co/ibm-granite/granite-vision-3.1-2b-preview
```

Errors with quantized model

by tatyanavidrevich - opened Feb 13, 2025

Discussion

tatyanavidrevich

Feb 13, 2025

I am using the following quantization method:

from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True

)
model = AutoModelForVision2Seq.from_pretrained("ibm-granite/granite-vision-3.1-2b-preview", quantization_config=bnb_config)

During generation, I get an error:
/usr/local/lib/python3.11/dist-packages/torch/nn/functional.py in multi_head_attention_forward(query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training, key_padding_mask, need_weights, attn_mask, use_separate_proj_weight, q_proj_weight, k_proj_weight, v_proj_weight, static_k, static_v, average_attn_weights, is_causal)
6249 attn_output.transpose(0, 1).contiguous().view(tgt_len * bsz, embed_dim)
6250 )
-> 6251 attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
6252 attn_output = attn_output.view(tgt_len, bsz, attn_output.size(1))
6253

RuntimeError: self and mat2 must have the same dtype, but got Half and Byte

It works fine w/o quantization, however quantization is useful during fine-tuning, could you please suggest how to make it work?

Thank you

aarbelle

Feb 16, 2025

Thank you for raising this issue,
We managed to reproduce the error and are currently investigating.

elischwartz

IBM Granite org Feb 16, 2025

Hi @tatyanavidrevich

There's an issue with the quantization of the vision encoder.
Quantizing with the following config should work:

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
        llm_int8_skip_modules=["vision_tower", "lm_head"],  # Skip problematic modules
        llm_int8_enable_fp32_cpu_offload=True
    )

tatyanavidrevich

Feb 16, 2025

Thank you, I will give it a try. I am basically trying to reduce the model size so that I can fine-tune it on A100 GPU

elischwartz

IBM Granite org Feb 16, 2025

Check out the example here:
https://huggingface.co/learn/cookbook/en/fine_tuning_granite_vision_sft_trl

I still need to push the quantization fix there, but the full fine tuning works on A100.

tatyanavidrevich

Feb 16, 2025

It works, thank you! This is very helpful

aarbelle

Feb 16, 2025

Thank you @elischwartz
I am closing this issue for now.

aarbelle changed discussion status to closed Feb 16, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment