Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
MohammadRafiML
/
Qwen3-4B-Instruct-2507-Capstone-MathRL
like
0
Reinforcement Learning
PEFT
Safetensors
lora
sft
grpo
math
tool-use
Model card
Files
Files and versions
xet
Community
Use this model
main
Qwen3-4B-Instruct-2507-Capstone-MathRL
568 MB
Ctrl+K
Ctrl+K
1 contributor
History:
11 commits
MohammadRafiML
Update model card: base model + SFT + GRPO adapter details
d33f2b6
verified
2 days ago
grpo_adapter
Upload adapter_model.safetensors
2 days ago
sft_adapter
Upload adapter_model.safetensors
2 days ago
.gitattributes
Safe
1.52 kB
initial commit
2 days ago
README.md
Safe
1.82 kB
Update model card: base model + SFT + GRPO adapter details
2 days ago