gemma-3-1b-it-qat-int4-unquantized vs. gemma-3-1b-it-qat-q4_0-unquantized

#2
by sainishanthvetsa - opened

Can someone point-out what is the difference to these two models in google's HF repo. q4_0 vs. int4?

Hi @sainishanthvetsa -

Thanks for reaching out to us!

The difference between gemma-3-1b-it-qat-int4-unquantized vs. gemma-3-1b-it-qat-q4_0-unquantized lies in the quantization scheme they were optimized for during Quantization-Aware Training (QAT). Both are unquantized checkpoints, so you will need to quantize them before use. Their names indicate the quantization format they were best prepared for. The Q4_0 model is optimized for GGUF format used by inference engines such as Ollama, llama.cpp, and MLX, while int4 is optimized for more general purpose INT4 schemes. You should consider choosing the model that matches the format of your inference engine.

For more details, please refer to this official documentation on Gemma 3 QAT Models: https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/

Thanks!

Sign up or log in to comment