Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

MohammadRafiML
/
Qwen3-4B-Instruct-2507-Capstone-MathRL

Reinforcement Learning
PEFT
Safetensors
lora
sft
grpo
math
tool-use
Model card Files Files and versions
xet
Community
Qwen3-4B-Instruct-2507-Capstone-MathRL
568 MB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 11 commits
MohammadRafiML's picture
MohammadRafiML
Update model card: base model + SFT + GRPO adapter details
d33f2b6 verified 2 days ago
  • grpo_adapter
    Upload adapter_model.safetensors 2 days ago
  • sft_adapter
    Upload adapter_model.safetensors 2 days ago
  • .gitattributes
    1.52 kB
    initial commit 2 days ago
  • README.md
    1.82 kB
    Update model card: base model + SFT + GRPO adapter details 2 days ago