Quantized Tflite?
Hi,
Thanks for the models, but are there any resources on how we can quantize the models, to INT8 or FP16?
You can use Quantize job in AI Hub to quantize models to INT8/INT16 (weights as int8 and activations as int8/int16). TFLite models on HTP run with fp16 activations already.
Hi,
When I'm trying to run the export command with the precision flag set to int (or even int8), I'm getting the following error:
$./.venv/bin/python -m qai_hub_models.models.whisper_tiny_en.export --precision int8 --output-dir ./model --skip-inferencing
export.py: error: argument --precision: invalid choice: 'int8' (choose from 'float')
Do you have any docs on converting the whisper model to int8?
We haven't onboarded a quantized version of Whisper yet. This process is a bit involved and is not something our quantize jobs can do automatically for this model. If you are motivated enough, I would point you to using AIMET directly. AIMET assets can then be submitted to AI Hub for compilation.