Instructions to use TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751") model = AutoModelForCausalLM.from_pretrained("TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751
- SGLang
How to use TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751 with Docker Model Runner:
docker model run hf.co/TingchenFu/sft_8k_qwen-2.5-math-1.5b_05021751
Model Card
SFT for mathematical reasoning in our MathIF project.
Github Repository: https://github.com/TingchenFu/MathIF
Training Details
We base our experiments on the DeepScaler dataset, which contains approximately 40k math reasoning samples. We first distill long CoT reasoning traces from QwQ-32B, filtering out samples where QwQ-32B fails to generate a correct answer or the CoT exceeds 8192 tokens. This results in 18k high-quality examples.
The training is conducted using 16 NVIDIA H100 GPUs. For reinforcement learning, we adopt the GRPO framework and use verifiable outcome-based rewards. The model is trained with VeRL framework with most hyper-parameters following the default setting.
Evaluation
We use nucleus sampling (T=1.0, p=0.95) with a maximum generation length of 16,384 tokens for decoding and vLLM engine for efficient inference.
Citation
BibTeX:
@article{fu2025scaling,
title={Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models},
author={Fu, Tingchen and Gu, Jiawei and Li, Yafu and Qu, Xiaoye and Cheng, Yu},
journal={arXiv preprint arXiv:2505.14810},
year={2025}
}
- Downloads last month
- 3