GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning Paper • 2507.10628 • Published Jul 14 • 1
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch3-cosine0516-v1 Text Generation • 8B • Updated May 16 • 7
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0515-v2 Text Generation • 8B • Updated May 16 • 10
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch3-cosine0516-v1 Text Generation • 8B • Updated May 16 • 7
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0515-v1 Text Generation • 8B • Updated May 15 • 6
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0515-v1 Text Generation • 8B • Updated May 15 • 6
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0515-v2 Text Generation • 8B • Updated May 16 • 10
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0515-v2 Text Generation • 8B • Updated May 16 • 10
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0515-v1 Text Generation • 8B • Updated May 15 • 6
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0514-v2 Text Generation • 8B • Updated May 15 • 8
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0514-v2 Text Generation • 8B • Updated May 15 • 8
Lansechen/Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0514-v1 Text Generation • 8B • Updated May 15 • 7