SigLIP (shape-optimized model)

SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository.

The Original repo is https://huggingface.co/google/siglip-so400m-patch14-384.

This model of SigLIP has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 3.4

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through

Support Platform

Models Raspberry Pi5 Only CPU Intel i7-13700 Raspberry Pi5 + M.2 Card
Image Encoder 8.3 s 1.2 s 0.19 s
Text Encoder 1.3 s 0.3 s 0.05 s

How to use

Download all files from this repository to the device

(axcl) axera@raspberrypi:~/samples/siglip $ tree -L 2
.
├── 000000039769.jpg
├── ax650
│   ├── siglip_text_u16.axmodel
│   └── siglip_vision_u16_fcu8.axmodel
├── config.json
├── onnx
│   ├── siglip-so400m-patch14-384_text.onnx
│   └── siglip-so400m-patch14-384_vision.onnx
├── python
│   ├── inference_axmodel.py
│   ├── inference_onnx.py
│   └── requirements.txt
└── tokenizer
    ├── config.json
    ├── preprocessor_config.json
    ├── special_tokens_map.json
    ├── spiece.model
    ├── tokenizer_config.json
    └── tokenizer.json

5 directories, 15 files

python env requirement

pyaxengine

https://github.com/AXERA-TECH/pyaxengine

wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3rc0/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl

others

pip install -r python/requirements.txt

Inputs

Test

"a photo of 2 cats", "a photo of 2 dogs"

Image

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)

root@ax650:/mnt/qtang/inner/SigLIP.axera# python3 python/inference_axmodel.py
[INFO] Available providers:  ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.7.2a
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 3.86 seconds
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 3.22 seconds
Total model loading time: 7.08 seconds
Model inference time: 0.19 seconds
Model inference time: 0.05 seconds
Total inference time: 0.24 seconds
49.4% that image 0 is 'a photo of 2 cats'
root@ax650:/mnt/qtang/inner/SigLIP.axera# 

Inference with M.2 Accelerator card

What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.

(axcl) axera@raspberrypi:~/samples/siglip $ python python/inference_axmodel.py
[INFO] Available providers:  ['AXCLRTExecutionProvider']
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 12.31 seconds
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 12.37 seconds
Total model loading time: 24.68 seconds
Model inference time: 0.19 seconds
Model inference time: 0.05 seconds
Total inference time: 0.24 seconds
52.5% that image 0 is 'a photo of 2 cats'
(axcl) axera@raspberrypi:~/samples/siglip $ 
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXERA-TECH/siglip-so400m-patch14-384

Finetuned
(23)
this model