unsloth
/

QwQ-32B-unsloth-bnb-4bit

@@ -9,9 +9,13 @@ tags:
 - chat
 - qwen
 ---
 <div>
   <p style="margin-bottom: 0; margin-top: 0;">
-      <strong>This is Qwen-QwQ-32B with our bug fixes. <br> See <a href="https://huggingface.co/collections/unsloth/qwen-qwq-32b-collection-676b3b29c20c09a8c71a6235">our collection</a> for versions of QwQ-32B with our bug fixes including GGUF & 4-bit formats.</strong>
   </p>
   <p style="margin-bottom: 0;">
     <em>Unsloth's QwQ-32B <a href="https://unsloth.ai/blog/dynamic-4bit">Dynamic Quants</a> is selectively quantized, greatly improving accuracy over standard 4-bit.</em>
@@ -23,17 +27,51 @@ tags:
     <a href="https://discord.gg/unsloth">
       <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
     </a>
-    <a href="https://docs.unsloth.ai/">
       <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
     </a>
   </div>
 <h1 style="margin-top: 0rem;">Finetune your own Reasoning model like R1 with Unsloth!</h2>
 </div>
-We have a free Google Colab notebook for turning Qwen2.5 (3B) into a reasoning model: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(3B)-GRPO.ipynb
-## ✨ Finetune for Free
 All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

 - chat
 - qwen
 ---
+> [!NOTE]
+> To fix endless generations and for instructions on how to run QwQ-32B, view our [Tutorial here](https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively).
+>
 <div>
   <p style="margin-bottom: 0; margin-top: 0;">
+      <strong>Qwen-QwQ-32B with our bug fixes. <br> See <a href="https://huggingface.co/collections/unsloth/qwen-qwq-32b-collection-676b3b29c20c09a8c71a6235">our collection</a> for versions of QwQ-32B with our bug fixes including GGUF & 4-bit formats.</strong>
   </p>
   <p style="margin-bottom: 0;">
     <em>Unsloth's QwQ-32B <a href="https://unsloth.ai/blog/dynamic-4bit">Dynamic Quants</a> is selectively quantized, greatly improving accuracy over standard 4-bit.</em>
     <a href="https://discord.gg/unsloth">
       <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
     </a>
+    <a href="https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively">
       <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
     </a>
   </div>
 <h1 style="margin-top: 0rem;">Finetune your own Reasoning model like R1 with Unsloth!</h2>
 </div>
+To run this model, try:
+```python
+import os
+os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
+from huggingface_hub import snapshot_download
+snapshot_download(
+    repo_id = "unsloth/QwQ-32B-GGUF",
+    local_dir = "unsloth-QwQ-32B-GGUF",
+    allow_patterns = ["*Q4_K_M*"], # For Q4_K_M
+)
+```
+```bash
+./llama.cpp/llama-cli \
+    --model unsloth-QwQ-32B-GGUF/QwQ-32B-Q4_K_M.gguf \
+    --threads 32 \
+    --ctx-size 16384 \
+    --n-gpu-layers 99 \
+    --seed 3407 \
+    --prio 2 \
+    --temp 0.6 \
+    --repeat-penalty 1.1 \
+    --dry-multiplier 0.5 \
+    --min-p 0.1 \
+    --top-k 40 \
+    --top-p 0.95 \
+    -no-cnv \
+    --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc" \
+    --prompt "<|im_start|>user\nCreate a Flappy Bird game in Python."
+```
+See https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-without-bugs for more details!
+> [!NOTE]
+> To stop infinite generations - add `--samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"`
+>
+# ✨ Finetune for Free
+We have a free Google Colab notebook for turning Qwen2.5 (3B) into a reasoning model: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(3B)-GRPO.ipynb
 All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.