Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs
Abstract
Reinforcement learning enhances language models' ability to recall hierarchical knowledge without degrading memorized facts, as evidenced by improved performance on structured prompting and deep-retrieval tasks.
Reinforcement learning (RL) is often credited with improving language model reasoning and generalization at the expense of degrading memorized knowledge. We challenge this narrative by observing that RL-enhanced models consistently outperform their base and supervised fine-tuned (SFT) counterparts on pure knowledge recall tasks, particularly those requiring traversal of hierarchical, structured knowledge (e.g., medical codes). We hypothesize these gains stem not from newly acquired data, but from improved procedural skills in navigating and searching existing knowledge hierarchies within the model parameters. To support this hypothesis, we show that structured prompting, which explicitly guides SFTed models through hierarchical traversal, recovers most of the performance gap (reducing 24pp to 7pp on MedConceptsQA for DeepSeek-V3/R1). We further find that while prompting improves final-answer accuracy, RL-enhanced models retain superior ability to recall correct procedural paths on deep-retrieval tasks. Finally our layer-wise internal activation analysis reveals that while factual representations (e.g., activations for the statement "code 57.95 refers to urinary infection") maintain high cosine similarity between SFT and RL models, query representations (e.g., "what is code 57.95") diverge noticeably, indicating that RL primarily transforms how models traverse knowledge rather than the knowledge representation itself.
Community
Contrary to the common belief that RL degrades memorized knowledge, this paper shows that RL-enhanced LLMs actually improve on knowledge recall tasks, especially when navigating hierarchical structures like medical coding systems. The reason is unexpected: RL appears to teach models better strategies for searching through their existing knowledge rather than adding new facts. Even when researchers tried to close the gap using structured prompting with SFT models, a 7pp difference remained, suggesting RL instills navigation skills that can't be easily replicated through prompting alone.
This has real implications for how we evaluate and train future models—if the limiting factor isn't what models know but how efficiently they can retrieve it, we may be overlooking a key dimension of capability. It also opens the door to explicitly training models on internal search and traversal tasks, potentially unlocking latent knowledge that's already there but poorly accessible. The question becomes: how much untapped capability exists in current models simply because they lack effective retrieval mechanisms?
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models (2025)
- From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning? (2025)
- Learning to Reason in Structured In-context Environments with Reinforcement Learning (2025)
- MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval (2025)
- Tagging the Thought: Unlocking Personalization Reasoning via Reinforcement Learning (2025)
- Mitigating Forgetting Between Supervised and Reinforcement Learning Yields Stronger Reasoners (2025)
- Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper