How do I quantize jina-embeddings-v4?

#53

by Kong-Mei - opened Jul 11

Jul 11

•

Hi all! I need to quantize jina-embeddings-v4 to INT8. Has anyone tried this or can share any guidance?
Any tips, examples, or references would be really appreciated.

Thanks!

jupyterjazz

Jina AI org Jul 11

Hi @Kong-Mei , do you want to quantize the whole model or just the embeddings? We will soon publish quantized versions for INT8 and Binary embeddings.

Kong-Mei

Jul 14

•

edited Jul 14

Thanks! I only need the embeddings for a multimodal retrieval task. By the way, may I ask how much speed improvement we can expect from quantization?
If possible, I’d like to quantize both qwen2.5-VL-3b and the multi_vector_projector.

hanxiao

Jina AI org Jul 18

here u go: https://github.com/jina-ai/jina-embeddings-v4-gguf

Kong-Mei

Jul 18

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment