Haoyuan WU
hywu
AI & ML interests
None yet
Recent Activity
authored
a paper
22 days ago
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy
Gradient
upvoted
a
paper
24 days ago
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy
Gradient
authored
a paper
about 1 month ago
Reinforcement Learning on Pre-Training Data
Organizations
None yet