Spaces:

mlx-community
/

README

Running

App Files Files Community

Model attributes clarification

#18

by jvoid - opened 19 days ago

Discussion

jvoid

19 days ago

•

edited 19 days ago

Hi guys
Please for the following models set sample
mlx-community/gemma-3n-E2B-it-lm-4bit
mlx-community/gemma3-12b-it-4bit-DWQ
mlx-community/gemma-3-1b-it-DQ
mlx-community/gemma-3-27b-it-qat-6bit

What is 3 and 3n. Is it some extra attribute among the param number, quants etc. Or does it just reference to gemma3 version?
And how actually 3n differs simple 3
For E2B (as 2B params, right?) what prefix E means
As I've figured out it stands for instruction tuned . Still what lm means. Is it just a LargeModel? So if the rest models aren't then?
What is qat?
What do DWQ, DQ stand for

Is there any other well known tag names? Or is there maybe some glossary to refer all the once?

Another question is why for the set of models attributes in name it self differ to meta attributes in description

E.g.:

mlx-community/gemma3-12b-it-4bit-DWQ
12B 4bits in name differs to meta attributes in description
Model size 2B params
Tensor type BF16·U32

mlx-community/gemma-3n-E4B-it-4bit
4B 4bits differs to meta attributes in description
Model size 1.56B params (2)
Tensor type BF16·U32

Thank you

pcuenq

MLX Community org 18 days ago

•

edited 18 days ago

Hi @jvoid , Gemma 3n is a distinct family of models, it was primarily designed for on-device execution. You can read more about it in our blog post or Google's. These models use E in their names, it comes from "Effective" number of params. What it means is that E4B, for instance, would behave similarly to a 4B model in terms of performance and quality, but it uses less parameters as you noticed. This naming is just a Google convention for this family of models, it's not standard nomenclature among other open models.

For the lm suffix, I'm guessing the author of this MLX version used it to signal that they converted just the language portion of the model. The original model is capable of handling images and audio inputs (in addition to text inputs), so my guess is that mlx-community/gemma-3n-E2B-it-lm-4bit only handles text input.

QAT is Quantization-Aware Training, a family of training methods that try to improve the quality of the model after it has been quantized. They do this by inserting lower-precision operations while training, so the model learns to behave better when lower precision is actually used (after the model has been quantized).

DWQ is Differential Weight Quantization. It (more or less) compares the results of a quantized model against a reference model while fine-tuning on more data, achieving much higher quality than other "static" quantization methods for the same reduction in size. However, you need representative data, and the process is slow.

jvoid

17 days ago

Thank you very much @pcuenq

jvoid changed discussion status to closed 17 days ago

huyzed

MLX Community org 17 days ago

@pcuenq Great explanations that eluded me for a while

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment