Update README.md
Browse files
README.md
CHANGED
|
@@ -3,7 +3,7 @@ license: apache-2.0
|
|
| 3 |
base_model:
|
| 4 |
- Qwen/Qwen3-32B
|
| 5 |
---
|
| 6 |
-
# Qwen3-32B-AWQ-
|
| 7 |
|
| 8 |
Original Model: https://huggingface.co/Qwen/Qwen3-32B
|
| 9 |
|
|
@@ -48,4 +48,11 @@ model.quantize(tokenizer, quant_config=quant_config)
|
|
| 48 |
quant_path = './Qwen3-32B-AWQ-4bit-GEMM-sc'
|
| 49 |
model.save_quantized(quant_path)
|
| 50 |
tokenizer.save_pretrained(quant_path)
|
| 51 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
base_model:
|
| 4 |
- Qwen/Qwen3-32B
|
| 5 |
---
|
| 6 |
+
# Qwen3-32B-AWQ-GEMM-sc
|
| 7 |
|
| 8 |
Original Model: https://huggingface.co/Qwen/Qwen3-32B
|
| 9 |
|
|
|
|
| 48 |
quant_path = './Qwen3-32B-AWQ-4bit-GEMM-sc'
|
| 49 |
model.save_quantized(quant_path)
|
| 50 |
tokenizer.save_pretrained(quant_path)
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## Final notes
|
| 54 |
+
|
| 55 |
+
The quant appears to be significantly degraded. I'm trying one more quantization
|
| 56 |
+
with 128 samples, a different dataset (HuggingFaceTB/cosmopedia-100k), and a longer
|
| 57 |
+
max sequence length (40960). It will be ready in a few hours, and I'll upload it here:
|
| 58 |
+
https://huggingface.co/kmouratidis/Qwen3-32B-AWQ-GEMM-lc
|