--- base_model: - google/embeddinggemma-300m language: - en model_creator: Google model_name: embeddinggemma-300m model_type: gemma-embedding quantized_by: s3dev-ai tags: - sentence-similarity --- # Overview This page provides various quantisations of the [base model](https://huggingface.co/google/embeddinggemma-300m), in GGUF format. - google/embeddinggemma-300m # Model Description For a full model description, please refer to the [base model's](https://huggingface.co/google/embeddinggemma-300m) card. ## How are the GGUF files created? After cloning the author's original base model repository, `llama.cpp` is used to convert the model to a GGML compatible file, using `f32` as the output type; preserving the original fidelity. The model is converted *un-altered*, unless otherwise stated. Finally, for each respective quantisation level, `llama.cpp`'s `llama-quantize` executable is called using the F32 GGUF file as the source file. ## Quantisations To help visualise the difference in model quantisation (i.e. level of retained fidelity), the image below shows the cosine similarity scores for each quantisation, baselined against the 32-bit base model. It can be observed that lower fidelity yields a wider scatter in scores, relative to the 32-bit model. The underlying [base dataset](https://huggingface.co/datasets/sentence-transformers/stsb) was sampled to 1000 records with a unbiased similarity score distribution. Using the various quantisation levels of this model, embeddings were created for `sentence1` and `sentence2`. Finally, a cosine similarity score was calculated across the two embeddings, and plotted on the graph. > [!NOTE] **Note:** This graph currently only features a single trend, which was created against the un-quantised 32-bit model. Although the quantised GGUF files are available, neither `sentence-transformers` nor `llama-cpp-python` have been updated to support the `gemma-embedding` format, so we can't use them (yet). > > As soon as support is available, we'll update this graph to display the fidelity for the quantisations.