Generated by Optimum
optimum-cli export openvino \
--model ibm-granite/granite-3.2-2b-instruct \
--task text-generation-with-past \
--weight-format int4 \
--trust-remote-code \
--group-size 128 \
--quant-mode int4_f8e4m3 \
ov
Usage with openvino_genai
pip install optimum-intel@git+https://github.com/huggingface/optimum-intel.git
pip install openvino-genai fastapi uvicorn huggingface_hub
pip install nncf
import huggingface_hub as hf_hub
import time
model_id = "hsuwill000/granite-3.2-2b-instruct_int4_ov"
model_path = "ov"
hf_hub.snapshot_download(model_id, local_dir=model_path)
import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(model_path, "CPU")
# Create a streamer function
def streamer(subword):
print(subword, end='', flush=True)
# Return flag corresponds whether generation should be stopped.
return ov_genai.StreamingStatus.RUNNING
pipe.start_chat()
while True:
try:
prompt = input('question:\n')
except EOFError:
break
gen_result = pipe.generate([prompt], streamer=streamer, max_new_tokens=32768)
print(f"\n--- TPS --- {gen_result.perf_metrics.get_throughput().mean:.2f} tokens/s")
print('\n----------\n')
pipe.finish_chat()
- Downloads last month
- 15
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for hsuwill000/granite-3.2-2b-instruct_int4_ov
Base model
ibm-granite/granite-3.1-2b-base
Finetuned
ibm-granite/granite-3.1-2b-instruct
Finetuned
ibm-granite/granite-vision-3.2-2b