File size: 3,955 Bytes
4f0eba9 29f8c2d 4f0eba9 29f8c2d 4f0eba9 00bdc52 29f8c2d 00bdc52 29f8c2d 00bdc52 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
---
library_name: transformers
license: apache-2.0
language:
- en
- fr
- es
- it
- pt
- zh
- ar
- ru
base_model:
- HuggingFaceTB/SmolLM3-3B-Base
tags:
- openvino
- int4
- quantization
- edge-deployment
- optimization
- smollm3
inference: false
---
# SmolLM3 INT4 OpenVINO
## 🚀 Optimized for Edge Deployment
This is an INT4 quantized version of [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) using OpenVINO, designed for efficient inference on edge devices and CPUs.
## Model Overview
- **Base Model:** SmolLM3-3B
- **Quantization:** INT4 via OpenVINO
- **Size Reduction:** Significant compression achieved
- **Target Hardware:** CPUs, Intel GPUs, NPUs
- **Use Cases:** Local inference, edge deployment, resource-constrained environments
## 🔧 Technical Details
### Quantization Process
```python
# Quantized using OpenVINO NNCF
# INT4 symmetric quantization
# Calibration dataset: [specify if used]
```
### Model Architecture
- Same architecture as SmolLM3-3B
- GQA and NoPE preserved
- 64k context support (128k with YARN)
- Multilingual capabilities maintained
## 📊 Performance (Experimental)
> ⚠️ **Note:** This is an experimental quantization. Formal benchmarks pending.
Expected benefits of INT4 quantization:
- Reduced model size
- Faster CPU inference
- Lower memory requirements
- Some quality trade-off
Actual metrics will be added after proper benchmarking.
## 🛠️ How to Use
### Installation
```bash
pip install optimum[openvino] transformers
```
### Basic Usage
```python
from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer
model_id = "dev-bjoern/smollm3-int4-ov"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = OVModelForCausalLM.from_pretrained(model_id)
# Generate text
prompt = "Explain quantum computing in simple terms"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### With Extended Thinking
```python
messages = [
{"role": "system", "content": "/think"},
{"role": "user", "content": "Solve this step by step: 25 * 16"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
```
## 🎯 Intended Use
- **Edge AI applications**
- **Local LLM deployment**
- **Resource-constrained environments**
- **Privacy-focused applications**
- **Offline AI assistants**
## ⚡ Optimization Tips
1. **CPU Inference:** Use OpenVINO runtime for best performance
2. **Batch Processing:** Consider batching requests when possible
3. **Memory:** INT4 significantly reduces memory requirements
## 🧪 Experimental Status
This is my first experiment with OpenVINO INT4 quantization. Feedback and contributions are welcome!
### Known Limitations
- No formal benchmarks yet
- Quantization settings not fully optimized
- Some quality degradation vs full precision
### Future Improvements
- [ ] Comprehensive benchmarking
- [ ] Mixed precision experiments
- [ ] Model compression analysis
- [ ] Calibration dataset optimization
## 🤝 Contributing
Found issues or have suggestions? Please open a discussion or issue!
## 📚 Resources
- [Original SmolLM3 Model](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
- [OpenVINO Documentation](https://docs.openvino.ai/)
- [Optimum Intel](https://huggingface.co/docs/optimum/intel/index)
## 🙏 Acknowledgments
- HuggingFace team for SmolLM3
- Intel OpenVINO team for quantization tools
- Community for feedback and support
## 📝 Citation
If you use this model, please cite both the original and this work:
```bibtex
@misc{smollm3-int4-ov,
author = {Bjoern Bethge},
title = {SmolLM3 INT4 OpenVINO},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/dev-bjoern/smollm3-int4-ov}}
}
```
---
**Status:** 🧪 Experimental | **Feedback:** Welcome | **License:** Apache 2.0 |