Makatia's picture
Upload README.md with huggingface_hub
204d1a3 verified
---
license: mit
library_name: onnxruntime
tags:
- tinyllama
- onnx
- quantized
- edge-llm
- raspberry-pi
- local-inference
model_creator: TinyLlama
language: en
---
# TinyLlama-1.1B-Chat-v1.0 (ONNX): Local LLM Model Repository
This repository contains quantized ONNX exports of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), optimized for efficient local inference on resource-constrained devices such as Raspberry Pi and other ARM-based single-board computers.
---
## 🟦 ONNX Model
**Included files:**
- `model.onnx`
- `model_quantized.onnx`
- `model.onnx.data` (if sharded)
- Configuration files (`config.json`, `tokenizer.json`, etc.)
**Recommended for:** ONNX Runtime, Kleidi AI, and other compatible frameworks.
### Quick Start
```python
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
# ... inference code here ...
```
The ONNX export enables efficient inference on CPUs, NPUs, and other accelerators, making it ideal for local or edge deployments.
---
## πŸ“‹ Credits
- **Base model:** [TinyLlama](https://huggingface.co/TinyLlama)
- **ONNX export:** [Optimum](https://github.com/huggingface/optimum), [ONNX Runtime](https://github.com/microsoft/onnxruntime)
- **Model optimization:** ARM-optimized for Raspberry Pi
---
**Maintainer:** [Makatia](https://huggingface.co/Makatia)