Vision also working? Multimodal?

by Kufer - opened 1 day ago

Discussion

Kufer

1 day ago

•

edited 1 day ago

Hi,

does Vision also work perfectly with AWQ quant and so it is multimodal, like the original? Is it text-only or maybe multimodal, but with lower vision quality?

Thank you and blessings!

Edit: it seems it was quantized with the flickr30k image set? That is a general set for quantizing multimodal models with the same low error rate as text-only with AWQ? Thanks, that would be interesting to know! If there is much quality-loss compared to FP8.

jeffcookio

Owner about 20 hours ago

The vision layers and the multimodal projector were all ignored during quantization, so the vision portion of the model is unmodified from the original.

I haven't compared to FP8, but multimodal works really well with this checkpoint, for my use case at least.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment