---
base_model: 
  - google/embeddinggemma-300m
language:
  - en
model_creator: Google
model_name: embeddinggemma-300m
model_type: gemma-embedding
quantized_by: s3dev-ai
tags:
  - sentence-similarity
---

# Overview

This page provides various quantisations of the [base model](https://huggingface.co/google/embeddinggemma-300m), in GGUF format.
- google/embeddinggemma-300m

# Model Description

For a full model description, please refer to the [base model's](https://huggingface.co/google/embeddinggemma-300m) card.

## How are the GGUF files created?
After cloning the author's original base model repository, `llama.cpp` is used to convert the model to a GGML compatible file, using `f32` as the output type; preserving the original fidelity. The model is converted *un-altered*, unless otherwise stated.

Finally, for each respective quantisation level, `llama.cpp`'s `llama-quantize` executable is called using the F32 GGUF file as the source file.  

## Quantisations

To help visualise the difference in model quantisation (i.e. level of retained fidelity), the image below shows the cosine similarity scores for each quantisation, baselined against the 32-bit base model. It can be observed that lower fidelity yields a wider scatter in scores, relative to the 32-bit model.

The underlying [base dataset](https://huggingface.co/datasets/sentence-transformers/stsb) was sampled to 1000 records with a unbiased similarity score distribution. Using the various quantisation levels of this model, embeddings were created for `sentence1` and `sentence2`. Finally, a cosine similarity score was calculated across the two embeddings, and plotted on the graph.   

> [!NOTE] **Note:** This graph currently only features a single trend, which was created against the un-quantised 32-bit model. Although the quantised GGUF files are available, neither `sentence-transformers` nor `llama-cpp-python` have been updated to support the `gemma-embedding` format, so we can't use them (yet).
>
> As soon as support is available, we'll update this graph to display the fidelity for the quantisations. 

<!-- Image alignment -->
<div align="center">
  <img src="imgs/embgemma.png" alt="Quantisation Levels" width="90%">
</div>