--- library_name: transformers pipeline_tag: text-generation inference: true widget: - text: Hello! example_title: Hello world group: Python --- This model is for debugging. It is randomly initialized using the config from [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) but with smaller size. Codes: ```python from awq import AutoAWQForCausalLM from transformers import AutoTokenizer model_path = "yujiepan/meta-llama-3.1-tiny-random-hidden128" quant_config = { "zero_point": True, "q_group_size": 64, "w_bit": 4, "version": "GEMM", } # Load model model = AutoAWQForCausalLM.from_pretrained( model_path, low_cpu_mem_usage=True, use_cache=False, device_map='cuda', ) tokenizer = AutoTokenizer.from_pretrained(model_path) # Quantize model.quantize(tokenizer, quant_config=quant_config) ```