gemma-3-1b-it-qat-int4-unquantized vs. gemma-3-1b-it-qat-q4_0-unquantized

by sainishanthvetsa - opened 19 days ago

Discussion

sainishanthvetsa

19 days ago

Can someone point-out what is the difference to these two models in google's HF repo. q4_0 vs. int4?

sonali-kumari11

Google org 18 days ago

Hi @sainishanthvetsa -

Thanks for reaching out to us!

The difference between gemma-3-1b-it-qat-int4-unquantized vs. gemma-3-1b-it-qat-q4_0-unquantized lies in the quantization scheme they were optimized for during Quantization-Aware Training (QAT). Both are unquantized checkpoints, so you will need to quantize them before use. Their names indicate the quantization format they were best prepared for. The Q4_0 model is optimized for GGUF format used by inference engines such as Ollama, llama.cpp, and MLX, while int4 is optimized for more general purpose INT4 schemes. You should consider choosing the model that matches the format of your inference engine.

For more details, please refer to this official documentation on Gemma 3 QAT Models: https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment