Papers
arXiv:2511.01937

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

Published on Nov 2
· Submitted by Abdelaziz Bounhar on Nov 5
Authors:
,
,
,
,
,
,

Abstract

Retaining and up-weighting moderately easy problems in RLVR pipelines for LLMs reduces output verbosity without explicit length penalization.

AI-generated summary

Large language models (LLMs) trained for step-by-step reasoning often become excessively verbose, raising inference cost. Standard Reinforcement Learning with Verifiable Rewards (RLVR) pipelines filter out ``easy'' problems for training efficiency, leaving the model to train primarily on harder problems that require longer reasoning chains. This skews the output length distribution upward, resulting in a model that conflates ``thinking longer'' with ``thinking better''. In this work, we show that retaining and modestly up-weighting moderately easy problems acts as an implicit length regularizer. Exposing the model to solvable short-chain tasks constrains its output distribution and prevents runaway verbosity. The result is \emph{emergent brevity for free}: the model learns to solve harder problems without inflating the output length, despite the absence of any explicit length penalization. RLVR experiments using this approach on Qwen3-4B-Thinking-2507 (with a 16k token limit) achieve baseline pass@1 AIME25 accuracy while generating solutions that are, on average, nearly twice as short. The code is available at https://github.com/MBZUAI-Paris/Frugal-AI{GitHub}, with datasets and models on https://huggingface.co/collections/MBZUAI-Paris/k2-think-mini-68dcfa8b114686a4bd3dc2bc{Hugging Face}.

Community

Paper author Paper submitter

TL;DR: 🤖 Faster. Smarter. Frugal. and BETTER!
Our open-source RL-trained math model reduces verbosity by ~2× without losing accuracy (actually improving on some hard reasoning benchmarks like Omni-Hard) showing that easy problems can implicitly regularize length during RL.

Code is publicly available on Github.

Model and Data are publicly available on Hugging Face.

IMG_5863

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.01937 in a Space README.md to link it from this page.

Collections including this paper 1