Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning Paper • 2504.03380 • Published Apr 4
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Paper • 2505.11855 • Published May 17 • 10
Stable Language Model Pre-training by Reducing Embedding Variability Paper • 2409.07787 • Published Sep 12, 2024
Cross-lingual Transfer of Reward Models in Multilingual Alignment Paper • 2410.18027 • Published Oct 23, 2024
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference Paper • 2406.06424 • Published Jun 10, 2024 • 16
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12, 2024 • 67