Papers
arxiv:2507.07451

RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning

Published on Jul 10
· Submitted by hongzhizhang on Jul 17
Authors:
,
,
,
,

Abstract

RLEP, a reinforcement learning framework with experience replay, enhances large language model training by focusing on high-quality examples, leading to faster convergence and improved performance on math-related benchmarks.

AI-generated summary

Reinforcement learning (RL) for large language models is an energy-intensive endeavor: training can be unstable, and the policy may gradually drift away from its pretrained weights. We present RLEP\, -- \,Reinforcement Learning with Experience rePlay\, -- \,a two-phase framework that first collects verified trajectories and then replays them during subsequent training. At every update step, the policy is optimized on mini-batches that blend newly generated rollouts with these replayed successes. By replaying high-quality examples, RLEP steers the model away from fruitless exploration, focuses learning on promising reasoning paths, and delivers both faster convergence and stronger final performance. On the Qwen2.5-Math-7B base model, RLEP reaches baseline peak accuracy with substantially fewer updates and ultimately surpasses it, improving accuracy on AIME-2024 from 38.2% to 39.9%, on AIME-2025 from 19.8% to 22.3%, and on AMC-2023 from 77.0% to 82.2%. Our code, datasets, and checkpoints are publicly available at https://github.com/Kwai-Klear/RLEP to facilitate reproducibility and further research.

Community

Paper author Paper submitter

Our recent study demonstrates that harnessing collected experience can boost final performance. Key takeaways:

  • A well‑tuned DAPO baseline
  • Further gains achieved by incorporating RLEP

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.07451 in a Space README.md to link it from this page.

Collections including this paper 1