YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Orpheus-3b-0.1 AWQ 4-bit Quantized

This repository contains a 4-bit AWQ quantized version of canopylabs/orpheus-3b-0.1-pretrained.

Model Details

  • Original Model: canopylabs/orpheus-3b-0.1-pretrained
  • Quantization Method: AWQ (Activation-aware Weight Quantization)
  • Bit Precision: 4-bit
  • Group Size: 128
  • Zero Point: Enabled
  • Version: GEMM

Usage

You can load this model using the AutoAWQ library:

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = "chatboo/orpheus-3b-0.1-awq-4bit"
model = AutoAWQForCausalLM.from_quantized(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Example usage
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantization Configuration

The model was quantized with the following configuration:

quant_config = {
    "zero_point": True,
    "q_group_size": 128,
    "w_bit": 4,
    "version": "GEMM"
}

Benefits of Quantization

  • Reduced Size: The model size is significantly reduced compared to the original.
  • Memory Efficiency: Lower memory requirements for inference.
  • Faster Inference: Potentially faster inference especially on hardware with int4 acceleration.
  • Similar Performance: Maintains most of the quality of the original model.
Downloads last month
8
Safetensors
Model size
859M params
Tensor type
F32
I32
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support