Intel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound

Model Details

This gguf model is an q2ks mixed bits of moonshotai/Kimi-K2-Instruct generated by intel/auto-round algorithm.

How To Use

Requirements

Please follow the Build llama.cpp locally to install the necessary dependencies.

INT4 Inference

>>> text="9.11和9.8哪个数字大"
>>> ./llama-cli -m Kimi-K2-Instruct-BF16-384x14B-Q2_K_S-00001-of-00013.gguf -p You are Kimi, an AI assistant created by Moonshot AI.$text -n 512 --threads 16 -no-cnv
## Generated:
### 思考过程：
1. **理解问题**：题目问的是“9.11和9.8哪个数字大”，即比较两个小数的大小。
2. **明确数字**：9.11和9.8都是十进制小数，可以拆分为整数部分和小数部分。
   - 9.11：整数部分是9，小数部分是0.11。
   - 9.8：整数部分是9，小数部分是0.8。
3. **比较整数部分**：两个数的整数部分都是9，因此无法通过整数部分直接判断大小，需要比较小数部分。
4. **比较小数部分**：
   - 小数部分分别是0.11和0.8。
   - 可以统一位数进行比较：0.11是11/100，0.8是80/100。
   - 比较11和80，显然80 > 11。
5. **结论**：小数部分0.8 > 0.11，因此9.8 > 9.11。

### 关键概念：
- 小数比较时，先比较整数部分，整数部分相同时再比较小数部分。
- 小数部分可以补零对齐位数（如0.8可视为0.80），方便比较。

### 最终答案：9.8



>>> text="strawberry中有几个r?"
>>> ./llama-cli -m Kimi-K2-Instruct-BF16-384x14B-Q2_K_S-00001-of-00013.gguf -p You are Kimi, an AI assistant created by Moonshot AI.$text -n 512 --threads 16 -no-cnv
## Generated:
### Thought Process:

To determine how many times the letter "r" appears in the word "strawberry," we can break it down step by step:

1. **List each letter in the word**: s, t, r, a, w, b, e, r, r, y.
2. **Identify all instances of the letter "r"**:
   - The 3rd letter is "r".
   - The 8th letter is "r".
   - The 9th letter is "r".
3. **Count the occurrences**: There are 3 instances of "r" in "strawberry".

### Final Answer: 3


>>> text="There is a girl who likes adventure,"
>>> ./llama-cli -m Kimi-K2-Instruct-BF16-384x14B-Q2_K_S-00001-of-00013.gguf -p You are Kimi, an AI assistant created by Moonshot AI.$text -n 512 --threads 16 -no-cnv
## Generated:
…and her name is Kimi.

She packs a small red backpack with a flashlight, a folded map, and a pocket-sized notebook. Every Saturday she picks a new spot on the map—sometimes a forest trail, sometimes an abandoned lighthouse, sometimes just a crooked alley she’s never walked. She goes alone, because solitude makes the colors sharper and the stories clearer.

When she reaches the place, she opens her notebook and writes three things:
1. What the wind smells like.
2. The first sound she hears that no one else has noticed.
3. One question she still can’t answer.

She does this because adventure isn’t only about where you go; it’s about how the place rearranges the inside of your head. After she’s filled a page, she stands still, eyes closed, and counts to thirty. By the time she opens them, the world always looks a little different—brighter, riskier, kinder.

Then she walks home, already planning next Saturday’s tiny, enormous voyage.


>>> text="Please give a brief introduction of Moonshot AI."
>>> ./llama-cli -m Kimi-K2-Instruct-BF16-384x14B-Q2_K_S-00001-of-00013.gguf -p You are Kimi, an AI assistant created by Moonshot AI.$text -n 512 --threads 16 -no-cnv
## Generated:
Moonshot AI is a Chinese artificial intelligence company founded in 2023, focusing on developing large language models and related technologies. The company's flagship product is Kimi, a smart assistant and AI chatbot powered by its self-developed large language model. Moonshot AI aims to advance Chinese-language AI capabilities and provide users with intelligent, efficient, and friendly interaction experiences.

Generate the model

tuning

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_round import AutoRound

model_name = "Kimi-K2-Instruct-BF16"  ##must be BF16 model

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="cpu", torch_dtype="auto",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
layer_config = {}
for n, m in model.named_modules():
    if n == "lm_head" or isinstance(m, torch.nn.Embedding):
        layer_config[n] = {"bits": 8}
        print(n, 8)
    elif isinstance(m, torch.nn.Linear) and (not "expert" in n or "shared_experts" in n) and n != "lm_head":
        layer_config[n] = {"bits": 4}
        print(n, 4)
    else:
        print(n,2)


autoround = AutoRound(model, tokenizer, iters=0, layer_config=layer_config, batch_size=8, nsamples=512, low_gpu_mem_usage=True)
autoround.quantize_and_save("tmp_autoround", format="gguf:q2_k_s")

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Intel
/

Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound