q4 model size is bigger than int8 - why?
#4
by
sainishanthvetsa
- opened
Hi π The main reason why is that this q4 model only quantizes the MatMul weights, while Gather nodes are left unquantized. This is because the current version of Transformers.js (v3.x) doesn't support 4-bit gather operations... However, starting with Transformers.js v4 (currently in developer preview), you'll be able to perform 4-bit gathers, and the weights will be significantly smaller.
sainishanthvetsa
changed discussion status to
closed