Spaces:
Running
Model attributes clarification
Hi guys
Please for the following models set sample
mlx-community/gemma-3n-E2B-it-lm-4bit
mlx-community/gemma3-12b-it-4bit-DWQ
mlx-community/gemma-3-1b-it-DQ
mlx-community/gemma-3-27b-it-qat-6bit
- What is
3
and3n
. Is it some extra attribute among the param number, quants etc. Or does it just reference to gemma3 version?
And how actually3n
differs simple3
- For
E2B
(as 2B params, right?) what prefixE
means - As I've figured out
it
stands forinstruction tuned
. Still whatlm
means. Is it just a LargeModel? So if the rest models aren't then? - What is
qat
? - What do
DWQ
,DQ
stand for
Is there any other well known tag names? Or is there maybe some glossary to refer all the once?
Another question is why for the set of models attributes in name it self differ to meta attributes in description
E.g.:
mlx-community/gemma3-12b-it-4bit-DWQ
12B 4bits in name differs to meta attributes in description
Model size 2B params
Tensor type BF16·U32
mlx-community/gemma-3n-E4B-it-4bit
4B 4bits differs to meta attributes in description
Model size 1.56B params (2)
Tensor type BF16·U32
Thank you
Hi
@jvoid
, Gemma 3n is a distinct family of models, it was primarily designed for on-device execution. You can read more about it in our blog post or Google's. These models use E
in their names, it comes from "Effective" number of params. What it means is that E4B
, for instance, would behave similarly to a 4B model in terms of performance and quality, but it uses less parameters as you noticed. This naming is just a Google convention for this family of models, it's not standard nomenclature among other open models.
For the lm
suffix, I'm guessing the author of this MLX version used it to signal that they converted just the language portion of the model. The original model is capable of handling images and audio inputs (in addition to text inputs), so my guess is that mlx-community/gemma-3n-E2B-it-lm-4bit
only handles text input.
QAT
is Quantization-Aware Training, a family of training methods that try to improve the quality of the model after it has been quantized. They do this by inserting lower-precision operations while training, so the model learns to behave better when lower precision is actually used (after the model has been quantized).
DWQ
is Differential Weight Quantization. It (more or less) compares the results of a quantized model against a reference model while fine-tuning on more data, achieving much higher quality than other "static" quantization methods for the same reduction in size. However, you need representative data, and the process is slow.