Xiusi Chen

XtremSup

https://xiusic.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 20 days ago

Perception-Aware Policy Optimization for Multimodal Reasoning

upvoted a paper about 2 months ago

Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

upvoted a paper about 2 months ago

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

View all activity

Organizations

upvoted a paper 20 days ago

Perception-Aware Policy Optimization for Multimodal Reasoning

Paper • 2507.06448 • Published 21 days ago • 44

upvoted 2 papers about 2 months ago

Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6 • 74

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Paper • 2505.24846 • Published May 30 • 15

upvoted 2 papers 2 months ago

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

Paper • 2505.22961 • Published May 29 • 8

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

Paper • 2505.13508 • Published May 16 • 14

upvoted a collection 3 months ago

RM-R1

Collection

RM-R1: Reward Modeling as Reasoning • 16 items • Updated Jun 29 • 8

authored a paper 3 months ago

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78

upvoted 2 papers 3 months ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5 • 25

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78

authored a paper 3 months ago

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 45

upvoted 2 papers 3 months ago

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 33

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 45

liked 3 datasets 4 months ago

liked a dataset 5 months ago

infly/INF-ORM-Preference-Magnitude-80K

Viewer • Updated Dec 5, 2024 • 76k • 42 • 8

liked a model 5 months ago

Skywork/Skywork-Critic-Llama-3.1-8B

Text Generation • Updated Sep 29, 2024 • 538 • 12

liked a dataset 5 months ago

Skywork/Skywork-Reward-Preference-80K-v0.2

Viewer • Updated Oct 25, 2024 • 77k • 531 • 54

upvoted an article 5 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

•

Feb 11

• 54

upvoted a paper 9 months ago

SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation

Paper • 2410.14745 • Published Oct 17, 2024 • 48

Xiusi Chen

AI & ML interests

Recent Activity

Organizations

XtremSup's activity

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment