arxiv:2509.24203
Yanxi Chen
yanxi-chen
AI & ML interests
None yet
Recent Activity
authored
a paper
about 2 months ago
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised
Fine-Tuning and Reinforcement Learning via Dynamic Weighting
authored
a paper
about 2 months ago
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm:
Demystifying Some Myths About GRPO and Its Friends
upvoted
a
paper
about 2 months ago
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm:
Demystifying Some Myths About GRPO and Its Friends
Organizations
None yet