AWQ Quant of nvidia/Llama-3.1-Nemotron-Nano-8B-v1

AWQ quant of nvidia/Llama-3.1-Nemotron-Nano-8B-v1 using llm-compressor for quantization.

Downloading quants with huggingface-cli

Click to view download instructions

Install hugginface-cli:

pip install -U "huggingface_hub[cli]"

Download quant by targeting the specific quant revision (branch):

huggingface-cli download ArtusDev/nvidia_Llama-3.1-Nemotron-Nano-8B-v1-AWQ
Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
I64
I32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ArtusDev/nvidia_Llama-3.1-Nemotron-Nano-8B-v1-AWQ

Quantized
(29)
this model