Vision also working? Multimodal?
#2
by
Kufer
- opened
Hi,
does Vision also work perfectly with AWQ quant and so it is multimodal, like the original? Is it text-only or maybe multimodal, but with lower vision quality?
Thank you and blessings!
Edit: it seems it was quantized with the flickr30k image set? That is a general set for quantizing multimodal models with the same low error rate as text-only with AWQ? Thanks, that would be interesting to know! If there is much quality-loss compared to FP8.
The vision layers and the multimodal projector were all ignored during quantization, so the vision portion of the model is unmodified from the original.
I haven't compared to FP8, but multimodal works really well with this checkpoint, for my use case at least.