GPTQ-v2-Llama-3.1-8B-Instruct / README.md

Qubitium

Update README.md

1daef8d verified 7 months ago

preview code

raw

history blame

1.59 kB

metadata

license: apache-2.0
base_model:
  - meta-llama/Llama-3.1-8B-Instruct
tags:
  - gptqmodel
  - gptq
  - v2

Simple Llama 3.1 8B-Instruct model quantized using GPTQ v2 with C2/en 256 rows of calibration data

This is not a production ready quant model but one used to evaluate GPTQ v1 vs GPTQ v2 for post-quant comparison.

GPTQ v1 is hosted at: https://huggingface.co/ModelCloud/GPTQ-v1-Llama-3.1-8B-Instruct

Eval Script using GPTQModel (main branch) and Marlin kernel + lm-eval (main branch)

# eval
from lm_eval.tasks import TaskManager
from lm_eval.utils import make_table

with tempfile.TemporaryDirectory() as tmp_dir:
    results = GPTQModel.eval(
        QUANT_SAVE_PATH,
        tasks=[EVAL.LM_EVAL.ARC_CHALLENGE, EVAL.LM_EVAL.GSM8K_PLATINUM_COT],
        apply_chat_template=True,
        random_seed=898,
        output_path= tmp_dir,
    )

    print(make_table(results))
    if "groups" in results:
        print(make_table(results, "groups"))

Full quantization and eval reproduction code: https://github.com/ModelCloud/GPTQModel/issues/1545#issuecomment-2811997133

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
arc_challenge	1	none	0	acc	↑	0.5034	±	0.0146
		none	0	acc_norm	↑	0.5068	±	0.0146
gsm8k_platinum_cot	3	flexible-extract	8	exact_match	↑	0.7601	±	0.0123
		strict-match	8	exact_match	↑	0.5211	±	0.0144