Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models Paper • 2404.00884 • Published Apr 1, 2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Paper • 2402.05808 • Published Feb 8, 2024
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models Paper • 2303.10420 • Published Mar 18, 2023 • 1
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7 • 39