Qwen3-4B-Instruct-2507 β Capstone MathRL
Fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using a two-stage SFT β GRPO pipeline for mathematical reasoning with calculator tool use.
Author: Mohammad Rafi
Base Model
- Model:
Qwen/Qwen3-4B-Instruct-2507 - Parameters: 4B
- Context length: 32k tokens
SFT Adapter β sft_adapter/
| Parameter | Value |
|---|---|
| Method | LoRA (Supervised Fine-Tuning) |
| LoRA rank | 32 |
| Epochs | 2 |
| Training samples | 500 |
| Task | Math reasoning (GSM8K + NuminaMath) |
| Size | 270.92 MB |
GRPO Adapter β grpo_adapter/
| Parameter | Value |
|---|---|
| Method | GRPO (Group Relative Policy Optimization) |
| Training samples | 400 |
| Group size | 8 |
| Learning rate | 3e-6 |
| Substeps | 1 |
| Curriculum | easy β intermediate β hard |
| Size | 270.92 MB |
Recommended: Use
grpo_adapter/β trained through the full SFT + GRPO pipeline.
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
# Load GRPO adapter (recommended)
model = PeftModel.from_pretrained(base, "MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL", subfolder="grpo_adapter")
model = model.merge_and_unload()
# Load SFT adapter only
# model = PeftModel.from_pretrained(base, "MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL", subfolder="sft_adapter")
# model = model.merge_and_unload()
- Downloads last month
- -
Model tree for MohammadRafiML/Qwen3-4B-Instruct-2507-Capstone-MathRL
Base model
Qwen/Qwen3-4B-Instruct-2507