Makatia
/

TinyLlama_TinyLlama-1.1B-Chat-v1.0_onnx

local-inference

Model card Files Files and versions

TinyLlama_TinyLlama-1.1B-Chat-v1.0_onnx / README.md

Makatia's picture

Upload README.md with huggingface_hub

204d1a3 verified 3 months ago

|

history blame contribute delete

1.44 kB

	---
	license: mit
	library_name: onnxruntime
	tags:
	- tinyllama
	- onnx
	- quantized
	- edge-llm
	- raspberry-pi
	- local-inference
	model_creator: TinyLlama
	language: en
	---

	# TinyLlama-1.1B-Chat-v1.0 (ONNX): Local LLM Model Repository

	This repository contains quantized ONNX exports of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), optimized for efficient local inference on resource-constrained devices such as Raspberry Pi and other ARM-based single-board computers.

	---

	## 🟦 ONNX Model

	Included files:

	- `model.onnx`
	- `model_quantized.onnx`
	- `model.onnx.data` (if sharded)
	- Configuration files (`config.json`, `tokenizer.json`, etc.)

	Recommended for: ONNX Runtime, Kleidi AI, and other compatible frameworks.

	### Quick Start

	```python
	import onnxruntime as ort

	session = ort.InferenceSession("model.onnx")
	# ... inference code here ...
	```

	The ONNX export enables efficient inference on CPUs, NPUs, and other accelerators, making it ideal for local or edge deployments.

	---

	## 📋 Credits

	- Base model: [TinyLlama](https://huggingface.co/TinyLlama)
	- ONNX export: [Optimum](https://github.com/huggingface/optimum), [ONNX Runtime](https://github.com/microsoft/onnxruntime)
	- Model optimization: ARM-optimized for Raspberry Pi

	---

	Maintainer: [Makatia](https://huggingface.co/Makatia)