Jan-nano GPTQ 4bit (vLLM-ready)
This is a 4-bit GPTQ quantized version of Menlo/Jan-nano, optimized for fast inference with vLLM.
- Quantization: GPTQ (4-bit)
- Group size: 128
- Dtype: float16
- Backend:
gptqmodel - Max context length: 4096 tokens
馃敡 Usage with vLLM
vllm serve ./jan-nano-4b-gptqmodel-4bit \
--quantization gptq \
--dtype half \
--max-model-len 4096
馃搧 Files
- Sharded
.safetensorsmodel weights model.safetensors.index.jsontokenizer.json,tokenizer_config.jsonconfig.json,generation_config.json,quantize_config.json(if available)
馃檹 Credits
- Downloads last month
- 65