Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11 • 45
Skywork/Skywork-Reward-V2-Qwen3-0.6B Text Classification • 0.6B • Updated Jul 6 • 3.54k • • 11
Skywork/Skywork-Reward-V2-Llama-3.1-8B-40M Text Classification • 8B • Updated Jul 6 • 1.03k • 17