VLM with GRPO training for vision-grounded decision making (https://arxiv.org/pdf/2503.16965)
Derek Zhe Hu
zhehuderek
AI & ML interests
NLP, Multimodality
Recent Activity
liked
a dataset
8 days ago
ProlificAI/social-reasoning-rlhf
updated
a dataset
about 2 months ago
zhehuderek/humor_understanding_combined
published
a dataset
about 2 months ago
zhehuderek/humor_understanding_combined
Organizations
None yet