Intel/MiniMax-M1-80k-int4-AutoRound-gptq-inc-v0

Model Details

This model is an int4 model with group_size 64 and symmetric quantization of MiniMaxAI/MiniMax-M1-80k generated by intel/auto-round algorithm.

Please follow the license of the original model.

This model experiences a significant accuracy degradation compared to the original(mmlu 0.7455 vs 0.7899). Caution is advised when deploying it.

How To Use

INT4 Inference(CPU/CUDA/INTEL GPU)

for intel gpu, requires auto-round>0.5.1

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
MODEL_PATH = "Intel/MiniMax-M1-80k-int4-AutoRound-gptq-inc-v0"
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, torch_dtype="auto", device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
prompts = [
    "What is your favourite condiment?",
    "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!",
    "Do you have mayonnaise recipes?"
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": [{"type": "text", "text": prompt}]}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)

tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True).to(model.device)

generation_config = GenerationConfig(
    max_new_tokens=512,
    do_sample=False,
    eos_token_id=tokenizer.eos_token_id
)

generated_ids = model.generate(input_ids=inputs["input_ids"].to(model.device), generation_config=generation_config)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)


for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {response[i]}")
    print("-" * 50)
    
"""
--------------------------------------------------
Prompt: What is your favourite condiment?
Generated: user name=user                                                                                                                                                            What is your favourite condiment?
ai name=assistant
<think>
Okay, the user is asking about my favorite condiment. Let me think. First, I need to recall what a condiment is. Condiments are things like ketchup, mustard, mayo, etc., that you ad
d to food to enhance flavor. But as an AI, I don't eat food, so I don't have personal preferences. But maybe I should answer in a way that's helpful or fun.

Wait, the user might be looking for a creative or humorous response. Since I don't have feelings or tastes, I can mention that I don't have a favorite, but I can talk about popular
condiments or maybe joke about it. Alternatively, I could list some common condiments and maybe suggest one based on popularity or usage.

Alternatively, maybe I can say something like, "As an AI, I don't have preferences, but if I had to pick, I'd say..." and then mention a popular one. But I need to make sure I don't
 give a wrong answer. Let me check if there's a standard answer for this. Maybe people often ask this in a playful way, so I can respond with a light-hearted answer.

Alternatively, maybe I can say something like, "I don't have a favorite, but I know many people love ketchup or mayo!" But I should make sure to clarify that I don't have personal p
references. Maybe add a joke about how I don't eat, but if I did, I'd choose something. Hmm.

Wait, the user might be looking for a fun answer. Maybe I can say, "I don't have a favorite, but if I could taste, I'd probably go for something like sriracha or maybe peanut butter
!" But I need to make sure it's clear that I'm just making a joke. Alternatively, maybe mention a condiment that's commonly used, like ketchup or mustard, but in a playful way.

Alternatively, maybe I can say, "I don't have a favorite, but I can tell you that the most popular condiments in the US are ketchup, mustard, and mayo!" But that's more of a factual
 answer. But the question is about my favorite, which I don't have. So maybe I should clarify that I don't have preferences, but then maybe add a fun fact about condiments.

Wait, the user might be testing if I can handle a question that's not straightforward. So I need to make sure I answer correctly. Since I don't have feelings or preferences, I shoul
d state that, but maybe add a humorous twist. For example, "As an AI, I don't have a favorite condiment, but if I
--------------------------------------------------

Prompt: Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!
Generated: user name=user
Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!
ai name=assistant
<think>
Okay, the user mentioned they love fresh lemon juice and use it in their cooking. I need to figure out what they might be looking for. Maybe they want tips on using lemon juice, recipes, or how to store it? Let me think.

First, they said they add it to whatever they're cooking, so maybe they want ideas for different dishes. Or perhaps they're looking for the best way to use lemon juice in specific recipes. Also, they mentioned "fresh" lemon juice, so maybe they prefer it over bottled? I should highlight the benefits of fresh vs bottled.

Wait, they might also be interested in how to get the most juice out of a lemon. Maybe tips on juicing, like rolling the lemon before cutting, or using a citrus juicer. Also, storage tips—how long can you keep fresh lemon juice? Maybe they want to know if they can freeze it or something.

Another angle: maybe they want to know how to use lemon juice in different cuisines. Like, in Mediterranean dishes, or as a marinade. Or maybe they want to know how to balance the acidity in recipes. Also, could they be looking for substitutes if they don't have lemon juice? Like vinegar or lime juice?

Wait, the user said "zesty flavor," so maybe they want to know how to use lemon juice to enhance flavors in various dishes. Maybe they need ideas for recipes where lemon juice is a key ingredient. Or maybe they're looking for ways to use lemon juice in baking, like in cakes or cookies.

Also, considering the context, they might be a home cook looking for tips on using lemon juice effectively. Maybe they want to know how to store lemon juice, or how to use it in different dishes. Let me check if there's any specific question they have, but since they just mentioned they like using it, maybe they want more ideas or best practices.

I should structure the answer to cover different aspects: uses in cooking, tips for getting the most juice, storage, maybe some recipe ideas. Also, mention the health benefits if relevant. But since they didn't ask for health benefits, maybe focus on culinary uses.

Wait, the user might be looking for specific recipes where lemon juice is used, or how to incorporate it into various dishes. Maybe they want to know how to use it in marinades, dressings, or as a finishing touch. Also, maybe they want to know how to balance the acidity with other flavors.

Alternatively, maybe they want to know how to make lemon juice last longer, or how to use
--------------------------------------------------
Prompt: Do you have mayonnaise recipes?
Generated: user name=user
Do you have mayonnaise recipes?
ai name=assistant
<think>
Okay, the user is asking if I have mayonnaise recipes. Let me think. First, I need to recall if I have any mayonnaise recipes in my knowledge base. Wait, I remember that I can gener
ate recipes based on the user's request. But I should check if there's a specific type of mayonnaise they want. Maybe they want a classic one, or a vegan version, or something else.

Wait, the user just asked if I have mayonnaise recipes. They didn't specify any particular type. So I should probably provide a basic mayonnaise recipe. But maybe I should also ment
ion variations or alternatives in case they want something different. Let me think about the standard ingredients for mayonnaise. It's usually egg yolks, oil, mustard, lemon juice o
r vinegar, and seasoning. But some people might have dietary restrictions, like needing a vegan version using aquafaba or something else.

I should start by giving a classic recipe. Then maybe add some variations. Also, I should make sure to mention the ingredients and steps clearly. Let me check if I have any specific
 recipes stored. Wait, I don't have a database of recipes, but I can generate one based on common knowledge. So I can outline a basic mayonnaise recipe, then maybe suggest some vari
ations like using different oils, adding herbs, or making it vegan.

Wait, but the user might just want a simple recipe. Let me structure the response. First, give the classic recipe with ingredients and steps. Then maybe mention possible variations.
 Also, note that homemade mayo can be tricky because of the emulsification process. Maybe include some tips for success, like adding oil slowly, using room temperature ingredients,
etc.

Wait, but the user might not know that. So I should include some tips. Also, maybe mention that if the mayo breaks, they can fix it by adding a bit more water or another egg yolk. H
mm. Also, maybe mention using a blender or food processor if they don't want to do it by hand. But traditional mayo is made by hand with a whisk.

Wait, but some people use a blender or food processor for convenience. So maybe include both methods. Also, note that the egg yolk is raw, so if someone is concerned about raw eggs,
 they can use pasteurized eggs or a substitute like aquafaba for vegan mayo.

Wait, but the user didn't specify any dietary restrictions. So maybe just give the classic recipe first, then mention alternatives. Let me outline the steps. Let's see:

Classic Mayonnaise Recipe:
- Ingredients: egg yolks, oil, mustard, lemon
--------------------------------------------------


"""

Generate the model

For reference, quantization tuning was performed on a single 140GB GPU.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
model_name = "MiniMaxAI/MiniMax-M1-80k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", trust_remote_code=True)
from auto_round import AutoRound
 
autoround = AutoRound(model=model, tokenizer=tokenizer, nsamples=512,
                      low_gpu_mem_usage=True, seqlen=2048, group_size=64, sym=True,
                      batch_size=1, gradient_accumulate_steps=4
                      )
autoround.quantize_and_save(format="auto_round:auto_gptq", output_dir=f"tmp_autoround/")

Evaluate the model

pip3 install lm-eval==0.4.9

lm-eval --model hf --model_args pretrained=Intel/MiniMax-M1-80k-int4-AutoRound-gptq-inc-v0  --tasks mmlu --batch_size 8

Metric	BF16(lm-eval==0.4.9)	INT4
mmlu	0.7899	0.7455

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Intel
/

MiniMax-M1-80k-int4-AutoRound-gptq-inc-v0