SigLIP (shape-optimized model)
SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository.
The Original repo is https://huggingface.co/google/siglip-so400m-patch14-384.
This model of SigLIP has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 3.4
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through
The repo of AXera Platform, which you can get the detial of guide
Support Platform
Models | Raspberry Pi5 Only CPU | Intel i7-13700 | Raspberry Pi5 + M.2 Card |
---|---|---|---|
Image Encoder | 8.3 s | 1.2 s | 0.19 s |
Text Encoder | 1.3 s | 0.3 s | 0.05 s |
How to use
Download all files from this repository to the device
(axcl) axera@raspberrypi:~/samples/siglip $ tree -L 2
.
├── 000000039769.jpg
├── ax650
│ ├── siglip_text_u16.axmodel
│ └── siglip_vision_u16_fcu8.axmodel
├── config.json
├── onnx
│ ├── siglip-so400m-patch14-384_text.onnx
│ └── siglip-so400m-patch14-384_vision.onnx
├── python
│ ├── inference_axmodel.py
│ ├── inference_onnx.py
│ └── requirements.txt
└── tokenizer
├── config.json
├── preprocessor_config.json
├── special_tokens_map.json
├── spiece.model
├── tokenizer_config.json
└── tokenizer.json
5 directories, 15 files
python env requirement
pyaxengine
https://github.com/AXERA-TECH/pyaxengine
wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3rc0/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl
others
pip install -r python/requirements.txt
Inputs
Test
"a photo of 2 cats", "a photo of 2 dogs"
Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)
root@ax650:/mnt/qtang/inner/SigLIP.axera# python3 python/inference_axmodel.py
[INFO] Available providers: ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.7.2a
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 3.86 seconds
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 3.22 seconds
Total model loading time: 7.08 seconds
Model inference time: 0.19 seconds
Model inference time: 0.05 seconds
Total inference time: 0.24 seconds
49.4% that image 0 is 'a photo of 2 cats'
root@ax650:/mnt/qtang/inner/SigLIP.axera#
Inference with M.2 Accelerator card
What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.
(axcl) axera@raspberrypi:~/samples/siglip $ python python/inference_axmodel.py
[INFO] Available providers: ['AXCLRTExecutionProvider']
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 12.31 seconds
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4-dirty 739e2b35-dirty
Model loading time: 12.37 seconds
Total model loading time: 24.68 seconds
Model inference time: 0.19 seconds
Model inference time: 0.05 seconds
Total inference time: 0.24 seconds
52.5% that image 0 is 'a photo of 2 cats'
(axcl) axera@raspberrypi:~/samples/siglip $
- Downloads last month
- 2
Model tree for AXERA-TECH/siglip-so400m-patch14-384
Base model
google/siglip-so400m-patch14-384