EmbeddingGemma-300M LiteRT

This repository contains the google/embeddinggemma-300m model converted to LiteRT (TFLite) format for on-device inference in the Android application.

Conversion Details

The model was converted using a custom fork of litert-torch to enable Matryoshka Representation Learning (MRL) optimizations.

For more details on the changes, see the Pull Request #931.

Prerequisites

uv venv --python 3.12
source .venv/bin/activate

# Clone and setup the custom litert-torch tool
git clone https://github.com/kamalkraj/litert-torch.git
cd litert-torch
git checkout embedding_gemma

# Install dependencies
uv pip install -r requirements.txt
uv pip install -e .

Export Instructions

First, download the base checkpoint:

hf download google/embeddinggemma-300m --local-dir embeddinggemma-300m

Then, use the litert_torch.generative.examples conversion script to convert it to TFLite format with dynamic_int8 quantization.

python -m litert_torch.generative.examples.embedding_gemma.convert_to_tflite \
    --checkpoint_path=embeddinggemma-300m \
    --output_path=. \
    --quantize=dynamic_int8 \
    --prefill_seq_lens=2048 \
    --final_l2_norm=False \
    --output_name_prefix=embedding_gemma_no_normalize

This will produce the embedding_gemma_no_normalize_q8.tflite file used by MedGem.

Downloads last month: 9

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support