R-PRM: Reasoning-Driven Process Reward Modeling
Shuaijie She
kevinpro
AI & ML interests
Reasoning, Chain of Thoughts, Alignment, Factual Consistency, Summarization
Recent Activity
new activity
about 17 hours ago
ByteDance-Seed/Seed-X-PPO-7B:Output truncated without reason
new activity
1 day ago
ByteDance-Seed/Seed-X-PPO-7B:当上下文过长的时候,例如1w字,大模型一直输出一句话,另外思维链也无法通过提示词来关闭