II-Thought-1.5B-Preview
Overview
II-Thought-1.5B-Preview is a Reinforcement Learning enhanced language model trained on a subset of II-Thought-RL-v0, the first large-scale, multi-task dataset designed for RL. While II-Thought-RL-v0 spans multiple domains (mathematics, coding, medicine, science, etc.), this preview release was trained on randomly sampled 50K math subset (dataset link).
Training Methodology
- Framework: ii_thought / verl
- Algorithm: GRPO
- Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- Reward Modeling
- Answer correctness reward

- Format correctness reward

- Final reward function

- Answer correctness reward
For a deeper look into the implementation details, refer to the our repository: Intelligent-Internet/ii-thought.
Evaluation Results
We used the EvalScope to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows:
- 64 responses:
AMC23, AIME24, AIME25 - 4 responses:
Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English - 1 responses:
IFEval
Sampling Configs:
- Max context length: 32,768
- Temperature: 0.6
- Top p: 0.95
- Top k: 40
- seed: 42
Additionally, for Live-Code-Bench, we leverage QWQ-Evaluation to reproduce results using a max context length of 32768, averaging over 8 runs.
| Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B-Instruct | II-Thought-1.5B-Preview |
|---|---|---|---|
| AMC23 | 69.69 | 54.26 | 79.77 |
| AIME24 | 29.43 | 10.73 | 34.17 |
| AIME25 | 23.39 | 8.8 | 26.09 |
| Olympiad Bench | 43.15 | 36.07 | 52.78 |
| Math500 | 83.6 | 73.15 | 87.2 |
| Math Gaokao 2023 English | 72.99 | 62.47 | 77.21 |
| Minerva Math | 27.57 | 24.45 | 30.79 |
| Vietnamese Entrance Math Exam | 40.32 | 26.69 | 46.24 |
| LiveCodeBench | 16.66 | 2.6 | 19.84 |
| IFEval | 44.24 | 27.22 | 44.84 |
| Average | 45.10 | 32.64 | 49.90 |
How To Use
Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.
For instance, you can easily start a service using vLLM:
vllm serve Intelligent-Internet/II-Thought-1.5B-Preview
You can also easily start a service using SGLang:
python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview
Usage Guidelines
- Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95
- For mathematical problems, explicitly request step-by-step reasoning and format the final answer within
\\boxed{}(e.g., "Please reason step by step, and put your final answer within \boxed{}.").
Citation
@misc{2025iithought,
title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset},
author={Intelligent Internet},
year={2025}
}
- Downloads last month
- 2