newmindai
/

QwQ-32B-r1

@@ -12,6 +12,7 @@ tags:
 - custom-reward
 - trl
 - llm
 library_name: transformers
 model_name: newmindai/QwQ-32B-r1
 pipeline_tag: text-generation
@@ -21,11 +22,15 @@ datasets:
 # Overview
-This model was fine-tuned using **Reinforcement Learning** on top of a pretrained LLM, enhanced with:
 - **ORMs** (Open Reward Modules)
 - **DAPO** (Decoder Appearance Optimization)
-- **SimpleScaling** (loss scaling strategy)
 ## Training Setup
@@ -37,8 +42,6 @@ This model was fine-tuned using **Reinforcement Learning** on top of a pretraine
 ### Reward Modules (ORMs)
-The following reward functions guided RL fine-tuning:
 | Reward Function   | Description                                           |
 |-------------------|-------------------------------------------------------|
 | `math`            | Evaluates symbolic math correctness (MathORM)        |
@@ -55,27 +58,49 @@ These were combined and scaled during training with adaptive weighting.
 - **DAPO (Appearance Optimization):** Regularizes attention and layout structure in decoder outputs.
 - **SimpleScaling** ([`newmindai/simplescaling`](https://huggingface.co/newmindai/simplescaling)): Controls optimizer behavior and reward balance across multiple objectives.
 ## Training Regime
 - **Stage 1 (Wait #1):** Model explores reward landscape; initial rewards unstable.
 - **Stage 2 (Wait #2):** Convergence improves as ORM signals align.
 - **Aha Moment:** Clear gains in math and formatting scores around ~2K steps after warm-up.
 ## Evaluation
 🐍 **Mezura-SnakeBench Benchmarking**
-  Final performance was benchmarked using the [Mezura](https://huggingface.co/spaces/newmindai/Mezura) SnakeBench framework — a standardized evaluation suite developed by NewmindAI for structured Turkish NLP tasks.
-## Usage Example
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "newmindai/QwQ-32B-r1"
-model = AutoModelForCausalLM.from_pretrained(model_id)
-tokenizer = AutoTokenizer.from_pretrained(model_id)
 prompt = "Türkiye'nin en yüksek dağı nedir?"
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=100)

 - custom-reward
 - trl
 - llm
+- adapter
 library_name: transformers
 model_name: newmindai/QwQ-32B-r1
 pipeline_tag: text-generation
 # Overview
+**newmindai/QwQ-32B-r1** is a **LoRA adapter**, fine-tuned via **Reinforcement Learning (RL)** on top of the base model `QwQ-32B`. It incorporates:
 - **ORMs** (Open Reward Modules)
 - **DAPO** (Decoder Appearance Optimization)
+- **SimpleScaling** (Multi-objective loss balancing)
+> This is an **adapter**, not a fully merged model. To use it, you must load it on top of the base model (`Qwen/QwQ-32B`) using the `peft` library.
+---
 ## Training Setup
 ### Reward Modules (ORMs)
 | Reward Function   | Description                                           |
 |-------------------|-------------------------------------------------------|
 | `math`            | Evaluates symbolic math correctness (MathORM)        |
 - **DAPO (Appearance Optimization):** Regularizes attention and layout structure in decoder outputs.
 - **SimpleScaling** ([`newmindai/simplescaling`](https://huggingface.co/newmindai/simplescaling)): Controls optimizer behavior and reward balance across multiple objectives.
+---
 ## Training Regime
 - **Stage 1 (Wait #1):** Model explores reward landscape; initial rewards unstable.
 - **Stage 2 (Wait #2):** Convergence improves as ORM signals align.
 - **Aha Moment:** Clear gains in math and formatting scores around ~2K steps after warm-up.
+---
 ## Evaluation
 🐍 **Mezura-SnakeBench Benchmarking**
+Final performance was benchmarked using the [Mezura](https://huggingface.co/spaces/newmindai/Mezura) SnakeBench framework — a standardized evaluation suite developed by NewmindAI for structured Turkish NLP tasks.
+---
+## Usage Example (LoRA Adapter)
+This adapter must be loaded on top of the base model `Qwen/QwQ-32B` using the [`peft`](https://github.com/huggingface/peft) library:
 ```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel
+import torch
+base_model_id = "Qwen/QwQ-32B"
+adapter_id = "newmindai/QwQ-32B-r1"
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(base_model_id)
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    base_model_id,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# Load LoRA adapter
+model = PeftModel.from_pretrained(base_model, adapter_id)
+# Inference
 prompt = "Türkiye'nin en yüksek dağı nedir?"
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=100)