UserRL: Training Interactive User-Centric Agent via Reinforcement Learning
Abstract
UserRL framework enhances user-centric RL agents by optimizing reward assignment and user simulation, demonstrating the importance of these factors over model scale.
Reinforcement learning (RL) has shown promise in training agentic models that move beyond static benchmarks to engage in dynamic, multi-turn interactions. Yet, the ultimate value of such agents lies in their ability to assist users, a setting where diversity and dynamics of user interaction pose challenges. In this work, we propose UserRL, a unified framework for training and evaluating user-centric abilities through standardized gym environments paired with simulated users. We systematically vary turn-level reward assignment and trajectory-level score calculation to analyze how different formulations affect learning under the GRPO algorithm. Our experiments across Qwen3 models reveal three key findings: (i) SFT cold start is critical for unlocking initial interaction ability and enabling sustained RL improvements; (ii) deliberate trajectory scoring yields more efficient and effective multi-turn interactions; and (iii) while stronger simulated users (e.g., GPT-4o) facilitates training, open-source simulators (e.g., Qwen3-32B) remain a cost-effective and transferable option. Together, these results highlight that careful design of reward shaping and user simulation choice is as crucial as model scale, and establish UserRL as a practical pathway for developing robust user-centric agentic models. All codes and data are public for future research.
Community
We have open-sourced all the gym environments and codes, including training pipeline and data resources!
Link: https://github.com/SalesforceAIResearch/UserRL
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- UserBench: An Interactive Gym Environment for User-Centric Agents (2025)
- MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use (2025)
- Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents (2025)
- AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning (2025)
- VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use (2025)
- Reinforcement Learning Foundations for Deep Research Systems: A Survey (2025)
- UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper