Model attributes clarification

#18
by jvoid - opened

Hi guys
Please for the following models set sample
mlx-community/gemma-3n-E2B-it-lm-4bit
mlx-community/gemma3-12b-it-4bit-DWQ
mlx-community/gemma-3-1b-it-DQ
mlx-community/gemma-3-27b-it-qat-6bit

  1. What is 3 and 3n. Is it some extra attribute among the param number, quants etc. Or does it just reference to gemma3 version?
    And how actually 3n differs simple 3
  2. For E2B (as 2B params, right?) what prefix E means
  3. As I've figured out it stands for instruction tuned . Still what lm means. Is it just a LargeModel? So if the rest models aren't then?
  4. What is qat?
  5. What do DWQ, DQ stand for

Is there any other well known tag names? Or is there maybe some glossary to refer all the once?

Another question is why for the set of models attributes in name it self differ to meta attributes in description

E.g.:

mlx-community/gemma3-12b-it-4bit-DWQ
12B 4bits in name differs to meta attributes in description
Model size 2B params
Tensor type BF16·U32

mlx-community/gemma-3n-E4B-it-4bit
4B 4bits differs to meta attributes in description
Model size 1.56B params (2)
Tensor type BF16·U32

Thank you

MLX Community org
edited 18 days ago

Hi @jvoid , Gemma 3n is a distinct family of models, it was primarily designed for on-device execution. You can read more about it in our blog post or Google's. These models use E in their names, it comes from "Effective" number of params. What it means is that E4B, for instance, would behave similarly to a 4B model in terms of performance and quality, but it uses less parameters as you noticed. This naming is just a Google convention for this family of models, it's not standard nomenclature among other open models.

For the lm suffix, I'm guessing the author of this MLX version used it to signal that they converted just the language portion of the model. The original model is capable of handling images and audio inputs (in addition to text inputs), so my guess is that mlx-community/gemma-3n-E2B-it-lm-4bit only handles text input.

QAT is Quantization-Aware Training, a family of training methods that try to improve the quality of the model after it has been quantized. They do this by inserting lower-precision operations while training, so the model learns to behave better when lower precision is actually used (after the model has been quantized).

DWQ is Differential Weight Quantization. It (more or less) compares the results of a quantized model against a reference model while fine-tuning on more data, achieving much higher quality than other "static" quantization methods for the same reduction in size. However, you need representative data, and the process is slow.

Thank you very much @pcuenq

jvoid changed discussion status to closed
MLX Community org

@pcuenq Great explanations that eluded me for a while

Sign up or log in to comment