EmbeddingGemma-300M LiteRT
This repository contains the google/embeddinggemma-300m model converted to LiteRT (TFLite) format for on-device inference in the Android application.
Conversion Details
The model was converted using a custom fork of litert-torch to enable Matryoshka Representation Learning (MRL) optimizations.
For more details on the changes, see the Pull Request #931.
Prerequisites
uv venv --python 3.12
source .venv/bin/activate
# Clone and setup the custom litert-torch tool
git clone https://github.com/kamalkraj/litert-torch.git
cd litert-torch
git checkout embedding_gemma
# Install dependencies
uv pip install -r requirements.txt
uv pip install -e .
Export Instructions
First, download the base checkpoint:
hf download google/embeddinggemma-300m --local-dir embeddinggemma-300m
Then, use the litert_torch.generative.examples conversion script to convert it to TFLite format with dynamic_int8 quantization.
python -m litert_torch.generative.examples.embedding_gemma.convert_to_tflite \
--checkpoint_path=embeddinggemma-300m \
--output_path=. \
--quantize=dynamic_int8 \
--prefill_seq_lens=2048 \
--final_l2_norm=False \
--output_name_prefix=embedding_gemma_no_normalize
This will produce the embedding_gemma_no_normalize_q8.tflite file used by MedGem.
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support