Vision also working? Multimodal?

#2
by Kufer - opened

Hi,

does Vision also work perfectly with AWQ quant and so it is multimodal, like the original? Is it text-only or maybe multimodal, but with lower vision quality?

Thank you and blessings!

Edit: it seems it was quantized with the flickr30k image set? That is a general set for quantizing multimodal models with the same low error rate as text-only with AWQ? Thanks, that would be interesting to know! If there is much quality-loss compared to FP8.

The vision layers and the multimodal projector were all ignored during quantization, so the vision portion of the model is unmodified from the original.

I haven't compared to FP8, but multimodal works really well with this checkpoint, for my use case at least.

Sign up or log in to comment