Quantized Tflite?

by SnappierSoap - opened Jul 17

Discussion

SnappierSoap

Jul 17

Hi,

Thanks for the models, but are there any resources on how we can quantize the models, to INT8 or FP16?

shreyajn

Qualcomm org Jul 21

You can use Quantize job in AI Hub to quantize models to INT8/INT16 (weights as int8 and activations as int8/int16). TFLite models on HTP run with fp16 activations already.

SnappierSoap

Jul 22

Hi,

When I'm trying to run the export command with the precision flag set to int (or even int8), I'm getting the following error:

$./.venv/bin/python -m qai_hub_models.models.whisper_tiny_en.export --precision int8 --output-dir ./model --skip-inferencing
 export.py: error: argument --precision: invalid choice: 'int8' (choose from 'float')

Do you have any docs on converting the whisper model to int8?

gustlars

Qualcomm org Aug 8

We haven't onboarded a quantized version of Whisper yet. This process is a bit involved and is not something our quantize jobs can do automatically for this model. If you are motivated enough, I would point you to using AIMET directly. AIMET assets can then be submitted to AI Hub for compilation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment