arxiv:2605.07865
Choi
yunhowhour
AI & ML interests
None yet
Recent Activity
authored a paper about 8 hours ago
KL for a KL: On-Policy Distillation with Control Variate Baseline authored a paper about 8 hours ago
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States upvoted a paper 3 days ago
KL for a KL: On-Policy Distillation with Control Variate BaselineOrganizations
None yet