Xiao Hu
huxiao09
ยท
AI & ML interests
Reinforcement Learning, LLM Reasoning
Recent Activity
authored
a paper
20 days ago
Query-Policy Misalignment in Preference-Based Reinforcement Learning
authored
a paper
20 days ago
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement
Learning
authored
a paper
20 days ago
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement
Learning
Organizations
None yet