chengzhi's picture

3 2 2

chengzhi

lczazu

AI & ML interests

None yet

Recent Activity

upvoted an article about 1 month ago

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

upvoted a paper 7 months ago

Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation

new activity 9 months ago

unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit:Error when load model

View all activity

Organizations

None yet

Collections 1

Papers 1

arxiv:2407.12223

models 1

lczazu/ppo-LunarLander-v2

Reinforcement Learning • Updated Apr 25, 2023 • 6

datasets 0

None public yet