File size: 1,593 Bytes
5c39353
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1daef8d
 
5c39353
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: apache-2.0
base_model:
- meta-llama/Llama-3.1-8B-Instruct
tags:
- gptqmodel
- gptq
- v2
---

## Simple Llama 3.1 8B-Instruct model quantized using GPTQ v2 with C2/en 256 rows of calibration data

This is not a production ready quant model but one used to evaluate GPTQ v1 vs GPTQ v2 for post-quant comparison.

GPTQ v1 is hosted at: https://huggingface.co/ModelCloud/GPTQ-v1-Llama-3.1-8B-Instruct

## Eval Script using GPTQModel (main branch) and Marlin kernel + lm-eval (main branch)

```py
# eval
from lm_eval.tasks import TaskManager
from lm_eval.utils import make_table

with tempfile.TemporaryDirectory() as tmp_dir:
    results = GPTQModel.eval(
        QUANT_SAVE_PATH,
        tasks=[EVAL.LM_EVAL.ARC_CHALLENGE, EVAL.LM_EVAL.GSM8K_PLATINUM_COT],
        apply_chat_template=True,
        random_seed=898,
        output_path= tmp_dir,
    )

    print(make_table(results))
    if "groups" in results:
        print(make_table(results, "groups"))
```

Full quantization and eval reproduction code: https://github.com/ModelCloud/GPTQModel/issues/1545#issuecomment-2811997133


|      Tasks       |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.5034|±  |0.0146|
|             |       |none  |     0|acc_norm|↑  |0.5068|±  |0.0146|
|gsm8k_platinum_cot|      3|flexible-extract|     8|exact_match|↑  |0.7601|±  |0.0123|
|                  |       |strict-match    |     8|exact_match|↑  |0.5211|±  |0.0144|