YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning
Introduction
Bingo is a reinforcement learning (RL) framework designed to improve the efficiency of reasoning in large language models.
It introduces two key mechanisms:
- Significance-aware length reward: Gradually reduces only insignificant tokens while preserving essential reasoning steps.
- Dynamic length reward: Encourages detailed reasoning for hard problems in early training, then decays to promote concise outputs.
This approach achieves a favorable balance between accuracy and efficiency, outperforming vanilla rewards and prior length-based reward baselines.
Checkpoints
The released checkpoints are trained from DeepSeek-R1-Distill-Qwen-1.5 and target reasoning-intensive tasks:
- Bingo-A π’ Accuracy-preferred checkpoint, selected at peak validation accuracy.
- Bingo-E β‘ Efficiency-preferred checkpoint, selected when response length stabilizes.
Checkpoints correspond to the folders r1_1.5b_Bingo_A and r1_1.5b_Bingo_E.
License: MIT
Citation
If you use these models, please cite:
@article{liu2025bingo,
title = {Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning},
author = {Liu, Hanbing and Cao, Lang and Ren, Yuanyi and Zhou, Mengyu and Dong, Haoyu and Ma, Xiaojun and Han, Shi and Zhang, Dongmei},
journal = {arXiv preprint arXiv:2506.08125},
year = {2025}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support