Makatia commited on
Commit
204d1a3
·
verified ·
1 Parent(s): 2618516

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md CHANGED
@@ -1,3 +1,53 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: onnxruntime
4
+ tags:
5
+ - tinyllama
6
+ - onnx
7
+ - quantized
8
+ - edge-llm
9
+ - raspberry-pi
10
+ - local-inference
11
+ model_creator: TinyLlama
12
+ language: en
13
  ---
14
+
15
+ # TinyLlama-1.1B-Chat-v1.0 (ONNX): Local LLM Model Repository
16
+
17
+ This repository contains quantized ONNX exports of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), optimized for efficient local inference on resource-constrained devices such as Raspberry Pi and other ARM-based single-board computers.
18
+
19
+ ---
20
+
21
+ ## 🟦 ONNX Model
22
+
23
+ **Included files:**
24
+
25
+ - `model.onnx`
26
+ - `model_quantized.onnx`
27
+ - `model.onnx.data` (if sharded)
28
+ - Configuration files (`config.json`, `tokenizer.json`, etc.)
29
+
30
+ **Recommended for:** ONNX Runtime, Kleidi AI, and other compatible frameworks.
31
+
32
+ ### Quick Start
33
+
34
+ ```python
35
+ import onnxruntime as ort
36
+
37
+ session = ort.InferenceSession("model.onnx")
38
+ # ... inference code here ...
39
+ ```
40
+
41
+ The ONNX export enables efficient inference on CPUs, NPUs, and other accelerators, making it ideal for local or edge deployments.
42
+
43
+ ---
44
+
45
+ ## 📋 Credits
46
+
47
+ - **Base model:** [TinyLlama](https://huggingface.co/TinyLlama)
48
+ - **ONNX export:** [Optimum](https://github.com/huggingface/optimum), [ONNX Runtime](https://github.com/microsoft/onnxruntime)
49
+ - **Model optimization:** ARM-optimized for Raspberry Pi
50
+
51
+ ---
52
+
53
+ **Maintainer:** [Makatia](https://huggingface.co/Makatia)