# Create a dedicated python env
python3 -m venv llmcompressor
source llmcompressor/bin/activate
# Install llm-compressor and additionnal needed libs
pip install llmcompressor qwen_vl_utils torchvision
# Download model in HF cache
hf download Qwen/Qwen2.5-VL-7B-Instruct
# Start quantization
wget https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w8a8_fp8/qwen_2_5_vl_example.py -O qwen_2_5_vl_fp8.py
python3 qwen_2_5_vl_fp8.py
Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ig1/Qwen2.5-VL-7B-Instruct-FP8-Dynamic

Quantized
(109)
this model

Collection including ig1/Qwen2.5-VL-7B-Instruct-FP8-Dynamic