ModelCloud
/

GPTQ-v2-Llama-3.1-8B-Instruct

4-bit precision

Model card Files Files and versions

GPTQ-v2-Llama-3.1-8B-Instruct / README.md

Qubitium's picture

Update README.md

1daef8d verified 7 months ago

|

1.59 kB

	---
	license: apache-2.0
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	tags:
	- gptqmodel
	- gptq
	- v2
	---

	## Simple Llama 3.1 8B-Instruct model quantized using GPTQ v2 with C2/en 256 rows of calibration data

	This is not a production ready quant model but one used to evaluate GPTQ v1 vs GPTQ v2 for post-quant comparison.

	GPTQ v1 is hosted at: https://huggingface.co/ModelCloud/GPTQ-v1-Llama-3.1-8B-Instruct

	## Eval Script using GPTQModel (main branch) and Marlin kernel + lm-eval (main branch)

	```py
	# eval
	from lm_eval.tasks import TaskManager
	from lm_eval.utils import make_table

	with tempfile.TemporaryDirectory() as tmp_dir:
	results = GPTQModel.eval(
	QUANT_SAVE_PATH,
	tasks=[EVAL.LM_EVAL.ARC_CHALLENGE, EVAL.LM_EVAL.GSM8K_PLATINUM_COT],
	apply_chat_template=True,
	random_seed=898,
	output_path= tmp_dir,
	)

	print(make_table(results))
	if "groups" in results:
	print(make_table(results, "groups"))
	```

	Full quantization and eval reproduction code: https://github.com/ModelCloud/GPTQModel/issues/1545#issuecomment-2811997133


	\| Tasks \|Version\| Filter \|n-shot\| Metric \| \|Value \| \|Stderr\|
	\|------------------\|------:\|----------------\|-----:\|-----------\|---\|-----:\|---\|-----:\|
	\|arc_challenge\| 1\|none \| 0\|acc \|↑ \|0.5034\|± \|0.0146\|
	\| \| \|none \| 0\|acc_norm\|↑ \|0.5068\|± \|0.0146\|
	\|gsm8k_platinum_cot\| 3\|flexible-extract\| 8\|exact_match\|↑ \|0.7601\|± \|0.0123\|
	\| \| \|strict-match \| 8\|exact_match\|↑ \|0.5211\|± \|0.0144\|