license: mit | |
library_name: onnxruntime | |
tags: | |
- tinyllama | |
- onnx | |
- quantized | |
- edge-llm | |
- raspberry-pi | |
- local-inference | |
model_creator: TinyLlama | |
language: en | |
# TinyLlama-1.1B-Chat-v1.0 (ONNX): Local LLM Model Repository | |
This repository contains quantized ONNX exports of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), optimized for efficient local inference on resource-constrained devices such as Raspberry Pi and other ARM-based single-board computers. | |
--- | |
## π¦ ONNX Model | |
**Included files:** | |
- `model.onnx` | |
- `model_quantized.onnx` | |
- `model.onnx.data` (if sharded) | |
- Configuration files (`config.json`, `tokenizer.json`, etc.) | |
**Recommended for:** ONNX Runtime, Kleidi AI, and other compatible frameworks. | |
### Quick Start | |
```python | |
import onnxruntime as ort | |
session = ort.InferenceSession("model.onnx") | |
# ... inference code here ... | |
``` | |
The ONNX export enables efficient inference on CPUs, NPUs, and other accelerators, making it ideal for local or edge deployments. | |
--- | |
## π Credits | |
- **Base model:** [TinyLlama](https://huggingface.co/TinyLlama) | |
- **ONNX export:** [Optimum](https://github.com/huggingface/optimum), [ONNX Runtime](https://github.com/microsoft/onnxruntime) | |
- **Model optimization:** ARM-optimized for Raspberry Pi | |
--- | |
**Maintainer:** [Makatia](https://huggingface.co/Makatia) | |