Resources for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
Xin Lai
xinlai
AI & ML interests
Multimodal LLM, LLM Reasoning, Point Cloud Segmentation, Image Segmentation
Recent Activity
upvoted
a
paper
10 days ago
VisionThink: Smart and Efficient Vision Language Model via Reinforcement
Learning
upvoted
a
paper
17 days ago
Scaling RL to Long Videos
upvoted
a
paper
about 1 month ago
MMSearch-R1: Incentivizing LMMs to Search
Organizations
None yet