SigLIP (shape-optimized model)

SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository.

The Original repo is https://huggingface.co/google/siglip-so400m-patch14-384.

This model of SigLIP has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 3.4

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through

The repo of AXera Platform, which you can get the detial of guide
Pulsar2 Link, How to Convert ONNX to axmodel

Support Platform

AX650
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card

Models	Raspberry Pi5 Only CPU	Intel i7-13700	Raspberry Pi5 + M.2 Card
Image Encoder	8.3 s	1.2 s	0.19 s
Text Encoder	1.3 s	0.3 s	0.05 s

How to use

Download all files from this repository to the device

(axcl) axera@raspberrypi:~/samples/siglip $ tree -L 2
.
├── 000000039769.jpg
├── ax650
│   ├── siglip_text_u16.axmodel
│   └── siglip_vision_u16_fcu8.axmodel
├── config.json
├── onnx
│   ├── siglip-so400m-patch14-384_text.onnx
│   └── siglip-so400m-patch14-384_vision.onnx
├── python
│   ├── inference_axmodel.py
│   ├── inference_onnx.py
│   └── requirements.txt
└── tokenizer
    ├── config.json
    ├── preprocessor_config.json
    ├── special_tokens_map.json
    ├── spiece.model
    ├── tokenizer_config.json
    └── tokenizer.json

5 directories, 15 files

python env requirement

pyaxengine

https://github.com/AXERA-TECH/pyaxengine

wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3rc0/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl

others

pip install -r python/requirements.txt

Inputs

Test

"a photo of 2 cats", "a photo of 2 dogs"

Image

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)

root@ax650:/mnt/qtang/inner/SigLIP.axera# python3 python/inference_axmodel.py
[INFO] Available providers:  ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.7.2a
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 3.86 seconds
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 3.22 seconds
Total model loading time: 7.08 seconds
Model inference time: 0.19 seconds
Model inference time: 0.05 seconds
Total inference time: 0.24 seconds
49.4% that image 0 is 'a photo of 2 cats'
root@ax650:/mnt/qtang/inner/SigLIP.axera#

Inference with M.2 Accelerator card

What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.

(axcl) axera@raspberrypi:~/samples/siglip $ python python/inference_axmodel.py
[INFO] Available providers:  ['AXCLRTExecutionProvider']
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 12.31 seconds
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 12.37 seconds
Total model loading time: 24.68 seconds
Model inference time: 0.19 seconds
Model inference time: 0.05 seconds
Total inference time: 0.24 seconds
52.5% that image 0 is 'a photo of 2 cats'
(axcl) axera@raspberrypi:~/samples/siglip $

AXERA-TECH
/

siglip-so400m-patch14-384