MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy

Overview

The model generates – pairs, where:

<rationale>: structured reasoning describing concept integration and difficulty design.
<problem>: a single Olympiad-level mathematical question that admits a verifiable numeric or symbolic answer.

MathSmith-HC combines complexity and consistency as difficulty rewards, producing more stable problems than MathSmith-Hard.

MathSmith Pipeline

The MathSmith framework consists of four main stages:

Concept Collection: Randomly sample concept–explanation pairs from PlanetMath to ensure data independence.
Supervised Fine-tuning (SFT): Train the model on collected concept–explanation pairs to establish foundational understanding.
Reinforcement Learning (RL): Optimize the model using GRPO with rewards based on:
- Structural validity
- Reasoning complexity
- Answer consistency
Weakness-Focused Self-Improvement: Iteratively identify and address model weaknesses by generating targeted problem variants.

Dependence

Transformers 4.52.4
Pytorch 2.7.0+cu126
Datasets 3.6.0
Tokenizers 0.21.1

Citation

If you find this work useful, please cite:

@article{zhan2025mathsmith,
  title={MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy},
  author={Zhan, Shaoxiong and Lai, Yanlin and Lu, Ziyu and Lin, Dahua and Yang, Ziqing and Tan, Fei},
  journal={arXiv preprint arXiv:2508.05592},
  year={2025}
}

Downloads last month: 9

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jasaxion/MathSmith-HC-Problem-Synthesizer-Qwen3-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(597)

this model

Quantizations

2 models

Jasaxion
/

MathSmith-HC-Problem-Synthesizer-Qwen3-8B

Overview

MathSmith Pipeline

Dependence

Citation

Model tree for Jasaxion/MathSmith-HC-Problem-Synthesizer-Qwen3-8B

Dataset used to train Jasaxion/MathSmith-HC-Problem-Synthesizer-Qwen3-8B