ModelCloud
/

GPTQ-v2-Llama-3.1-8B-Instruct

4-bit precision

Model card Files Files and versions

Qubitium commited on Apr 17

Commit

5c39353

·

verified ·

1 Parent(s): e121bb6

Update README.md

Files changed (1) hide show

README.md +44 -3

README.md CHANGED Viewed

@@ -1,3 +1,44 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model:
+- meta-llama/Llama-3.1-8B-Instruct
+tags:
+- gptqmodel
+- gptq
+- v2
+---
+## Simple Llama 3.1 8B-Instruct model quantized using GPTQ v2 with C2/en 256 rows of calibration data
+This is not a production ready quant model but one used to evaluate GPTQ v1 vs GPTQ v2 for post-quant comparison.
+GPTQ v1 is hosted at: https://huggingface.co/ModelCloud/GPTQ-v1-Llama-3.1-8B-Instruct
+## Eval Script using GPTQModel (main branch) and Marlin kernel + lm-eval (main branch)
+```py
+# eval
+from lm_eval.tasks import TaskManager
+from lm_eval.utils import make_table
+with tempfile.TemporaryDirectory() as tmp_dir:
+    results = GPTQModel.eval(
+        QUANT_SAVE_PATH,
+        tasks=[EVAL.LM_EVAL.ARC_CHALLENGE, EVAL.LM_EVAL.GSM8K_PLATINUM_COT],
+        apply_chat_template=True,
+        random_seed=898,
+        output_path= tmp_dir,
+    )
+    print(make_table(results))
+    if "groups" in results:
+        print(make_table(results, "groups"))
+```
+|      Tasks       |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
+|------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
+|arc_challenge|      1|none  |     0|acc     |↑  |0.5034|±  |0.0146|
+|             |       |none  |     0|acc_norm|↑  |0.5068|±  |0.0146|
+|gsm8k_platinum_cot|      3|flexible-extract|     8|exact_match|↑  |0.7601|±  |0.0123|
+|                  |       |strict-match    |     8|exact_match|↑  |0.5211|±  |0.0144|