YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning

πŸ“„ Paperβ€ƒπŸ’» Code


Introduction

Bingo is a reinforcement learning (RL) framework designed to improve the efficiency of reasoning in large language models.
It introduces two key mechanisms:

  • Significance-aware length reward: Gradually reduces only insignificant tokens while preserving essential reasoning steps.
  • Dynamic length reward: Encourages detailed reasoning for hard problems in early training, then decays to promote concise outputs.

This approach achieves a favorable balance between accuracy and efficiency, outperforming vanilla rewards and prior length-based reward baselines.


Checkpoints

The released checkpoints are trained from DeepSeek-R1-Distill-Qwen-1.5 and target reasoning-intensive tasks:

  • Bingo-A 🟒 Accuracy-preferred checkpoint, selected at peak validation accuracy.
  • Bingo-E ⚑ Efficiency-preferred checkpoint, selected when response length stabilizes.

Checkpoints correspond to the folders r1_1.5b_Bingo_A and r1_1.5b_Bingo_E.


License: MIT


Citation

If you use these models, please cite:

@article{liu2025bingo,
    title   = {Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning},
    author  = {Liu, Hanbing and Cao, Lang and Ren, Yuanyi and Zhou, Mengyu and Dong, Haoyu and Ma, Xiaojun and Han, Shi and Zhang, Dongmei},
    journal = {arXiv preprint arXiv:2506.08125},
    year    = {2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support