Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization
Abstract
Attention mechanisms in LLMs are analyzed to reveal reasoning patterns, leading to novel RL strategies that improve performance by focusing on critical tokens.
The reasoning pattern of Large language models (LLMs) remains opaque, and Reinforcement learning (RL) typically applies uniform credit across an entire generation, blurring the distinction between pivotal and routine steps. This work positions attention as a privileged substrate that renders the internal logic of LLMs legible, not merely as a byproduct of computation, but as a mechanistic blueprint of reasoning itself. We first distinguish attention heads between locally and globally focused information processing and reveal that locally focused heads produce a sawtooth pattern near the diagonal indicating phrasal chunks, while globally focused heads expose tokens that exert broad downstream influence over future tokens. We formalize these with two metrics: 1) Windowed Average Attention Distance, which measures the extent of backward attention within a clipped window; 2) Future Attention Influence, which quantifies a token's global importance as the average attention it receives from subsequent tokens. Taken together, these signals reveal a recurring preplan-and-anchor mechanism, where the model first performs a long-range contextual reference to generate an introductory token, which is immediately followed by or coincides with a semantic anchor token that organizes subsequent reasoning. Leveraging these insights, we introduce three novel RL strategies that dynamically perform targeted credit assignment to critical nodes (preplan tokens, anchor tokens, and their temporal coupling) and show consistent performance gains across various reasoning tasks. By aligning optimization with the model's intrinsic reasoning rhythm, we aim to transform opaque optimization into an actionable structure-aware process, hoping to offer a potential step toward more transparent and effective optimization of LLM reasoning.
Community
๐ฅ Core Summary:
๐น Redefining the Role of Attention: Attention is not just a byproduct of language model computations but a structured blueprint that reveals the underlying logic of reasoning. By analyzing attention patterns, we can more clearly capture the model's "thought process" in information integration and sequence generation, providing an interpretable framework for a reasoning process that is largely still a black box, helping to make the model's decision-making more transparent.
๐น Revolutionizing RL Algorithms: By aligning optimization objectives with the model's intrinsic reasoning rhythm, we transform traditional sequence-level rewards, which are uniformly distributed at the token level, into dynamic reward assignments that are structurally aware of the reasoning process. This mechanism dynamically identifies and strengthens key reasoning steps, driving model optimization into a more transparent, finer, and more efficient paradigm.
๐ง Key Reasoning Patterns Revealed by Attention
๐น Local Chunking: Local attention exhibits a typical near-diagonal sawtooth pattern, reflecting the model's dense internal construction at the "chunk" level. At chunk boundaries, the model performs long-range contextual retrieval (often accompanied by higher token entropy), and subsequent generation is often guided by this reference.
๐น Global Anchor Planning: Global attention identifies sparse but crucial core anchor tokens, which have broad global influence over subsequent tokens, frequently referenced back by later tokens. Experiments show that perturbing these anchors significantly alters the subsequent reasoning path.
๐น Preplan-Anchor Coupling Mechanism: A stable temporal coupling exists between local foresight signals and global anchor signals, forming a recurring reasoning rhythm: the model first generates a guiding token as a "preplan", followed by anchoring a core semantic node, thus systematically organizing the subsequent reasoning process.
โ๏ธ RL Algorithm Innovation: From Uniform Rewards to Structure-Aware Credit Assignment
Traditional sequence-level rewards are evenly distributed at the token level, ignoring key nodes in the reasoning structure. We propose a dynamic credit redistribution mechanism based on attention rhythm, aligning the optimization process with the model's intrinsic reasoning structure. Specifically, we implement three strategies:
๐น Preplan Guidance Strategy: Strengthens tokens that guide local chunk construction, improving long-range context referencing ability.
๐น Anchor Enhancement Strategy: Focuses on optimizing semantic anchors with global influence to enhance reasoning planning.
๐น Coupling Alignment Strategy: Strengthens the temporal coordination between preplanning and anchors, promoting a structured reasoning process.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning (2025)
- Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models (2025)
- Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning (2025)
- Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning (2025)
- LightReasoner: Can Small Language Models Teach Large Language Models Reasoning? (2025)
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents (2025)
- Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper