Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 141
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL Paper • 2403.03950 • Published Mar 6, 2024 • 16
Levels of AGI for Operationalizing Progress on the Path to AGI Paper • 2311.02462 • Published Nov 4, 2023 • 37
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems Paper • 2201.03916 • Published Jan 11, 2022 • 1
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis Paper • 2307.12856 • Published Jul 24, 2023 • 36
CLUTR: Curriculum Learning via Unsupervised Task Representation Learning Paper • 2210.10243 • Published Oct 19, 2022 • 1