jusjinuk
/

Llama-2-70b-hf-4bit-SqueezeLLM

Model card Files Files and versions

jusjinuk commited on Jun 19

Commit

a67be32

·

verified ·

1 Parent(s): 8d8b67d

Create README.md

Files changed (1) hide show

README.md +20 -0

README.md ADDED Viewed

	@@ -0,0 +1,20 @@

+---
+base_model:
+- meta-llama/Llama-2-70b-hf
+base_model_relation: quantized
+license: llama2
+---
+# Model Card
+- Base model: `meta-llama/Llama-2-70b-hf`
+- Quantization method: SqueezeLLM
+- Target bit-width: 4
+- Backend kernel: Any-Precision-LLM kernel (`ap-gemv`)
+- Calibration data: RedPajama (1024 sentences / 4096 tokens)
+- Calibration objective: Next-token prediction
+# How to run
+- Follow the instruction in https://github.com/snu-mllab/GuidedQuant.
+# References
+- [Model Paper](https://arxiv.org/abs/2505.07004)