How do I quantize jina-embeddings-v4?

#53
by Kong-Mei - opened

Hi all! I need to quantize jina-embeddings-v4 to INT8. Has anyone tried this or can share any guidance?
Any tips, examples, or references would be really appreciated.

Thanks!

Jina AI org

Hi @Kong-Mei , do you want to quantize the whole model or just the embeddings? We will soon publish quantized versions for INT8 and Binary embeddings.

Thanks! I only need the embeddings for a multimodal retrieval task. By the way, may I ask how much speed improvement we can expect from quantization?
If possible, I’d like to quantize both qwen2.5-VL-3b and the multi_vector_projector.

Jina AI org

Thanks!

Sign up or log in to comment