|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- meta-llama/Llama-3.1-8B-Instruct |
|
|
tags: |
|
|
- gptqmodel |
|
|
- gptq |
|
|
- v2 |
|
|
--- |
|
|
|
|
|
## Simple Llama 3.1 8B-Instruct model quantized using GPTQ v2 with C2/en 256 rows of calibration data |
|
|
|
|
|
This is not a production ready quant model but one used to evaluate GPTQ v1 vs GPTQ v2 for post-quant comparison. |
|
|
|
|
|
GPTQ v1 is hosted at: https://huggingface.co/ModelCloud/GPTQ-v1-Llama-3.1-8B-Instruct |
|
|
|
|
|
## Eval Script using GPTQModel (main branch) and Marlin kernel + lm-eval (main branch) |
|
|
|
|
|
```py |
|
|
# eval |
|
|
from lm_eval.tasks import TaskManager |
|
|
from lm_eval.utils import make_table |
|
|
|
|
|
with tempfile.TemporaryDirectory() as tmp_dir: |
|
|
results = GPTQModel.eval( |
|
|
QUANT_SAVE_PATH, |
|
|
tasks=[EVAL.LM_EVAL.ARC_CHALLENGE, EVAL.LM_EVAL.GSM8K_PLATINUM_COT], |
|
|
apply_chat_template=True, |
|
|
random_seed=898, |
|
|
output_path= tmp_dir, |
|
|
) |
|
|
|
|
|
print(make_table(results)) |
|
|
if "groups" in results: |
|
|
print(make_table(results, "groups")) |
|
|
``` |
|
|
|
|
|
Full quantization and eval reproduction code: https://github.com/ModelCloud/GPTQModel/issues/1545#issuecomment-2811997133 |
|
|
|
|
|
|
|
|
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| |
|
|
|------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:| |
|
|
|arc_challenge| 1|none | 0|acc |↑ |0.5034|± |0.0146| |
|
|
| | |none | 0|acc_norm|↑ |0.5068|± |0.0146| |
|
|
|gsm8k_platinum_cot| 3|flexible-extract| 8|exact_match|↑ |0.7601|± |0.0123| |
|
|
| | |strict-match | 8|exact_match|↑ |0.5211|± |0.0144| |