arxiv:2510.17498

Deep Self-Evolving Reasoning

Published on Oct 20

· Submitted by

Shun Zheng on Oct 21

Microsoft

Upvote

Authors:

Shun Zheng ,

Abstract

Deep Self-Evolving Reasoning (DSER) extends the reasoning capabilities of smaller models by iteratively improving solutions through a probabilistic Markov chain, enabling them to solve previously unsolvable problems and surpass larger models in accuracy.

AI-generated summary

Long-form chain-of-thought reasoning has become a cornerstone of advanced reasoning in large language models. While recent verification-refinement frameworks have enabled proprietary models to solve Olympiad-level problems, their effectiveness hinges on strong, reliable verification and correction capabilities, which remain fragile in open-weight, smaller-scale models. This work demonstrates that even with weak verification and refinement capabilities on hard tasks, the reasoning limits of such models can be substantially extended through a probabilistic paradigm we call Deep Self-Evolving Reasoning (DSER). We conceptualize iterative reasoning as a Markov chain, where each step represents a stochastic transition in the solution space. The key insight is that convergence to a correct solution is guaranteed as long as the probability of improvement marginally exceeds that of degradation. By running multiple long-horizon, self-evolving processes in parallel, DSER amplifies these small positive tendencies, enabling the model to asymptotically approach correct answers. Empirically, we apply DSER to the DeepSeek-R1-0528-Qwen3-8B model. On the challenging AIME 2024-2025 benchmark, DSER solves 5 out of 9 previously unsolvable problems and boosts overall performance, enabling this compact model to surpass the single-turn accuracy of its 600B-parameter teacher through majority voting. Beyond its immediate utility for test-time scaling, the DSER framework serves to diagnose the fundamental limitations of current open-weight reasoners. By clearly delineating their shortcomings in self-verification, refinement, and stability, our findings establish a clear research agenda for developing next-generation models with powerful, intrinsic self-evolving capabilities.

View arXiv page View PDF Add to collection

Community

shun-zheng

Paper author Paper submitter 1 day ago

How can a small model reason like a giant? With Deep Self-Evolving Reasoning (DSER), a new paradigm that reframes iterative verification and refinement as a stochastic process. By running multiple parallel, long-horizon "self-evolutions", DSER guides the model to naturally converge on correct solutions. This enabled the compact DeepSeek-R1-0528-Qwen3-8B to solve 5 out of 9 previously unsolvable AIME problems, achieving performance that rivals its 600B-parameter teacher, DeepSeek-R1-0528.

librarian-bot

about 14 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.17498 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.17498 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.17498 in a Space README.md to link it from this page.