Step-DPO Resources for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs" xinlai/DeepSeekMath-RL-Step-DPO Text Generation • 7B • Updated Jun 28, 2024 • 90 • 2 xinlai/Qwen2-7B-Instruct-Step-DPO Text Generation • 8B • Updated Jun 29, 2024 • 203 • 3 xinlai/Qwen2-72B-Instruct-Step-DPO Text Generation • 73B • Updated Jun 28, 2024 xinlai/DeepSeekMath-Base-SFT-Step-DPO Text Generation • 7B • Updated Jun 28, 2024 • 13
Step-DPO Resources for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs" xinlai/DeepSeekMath-RL-Step-DPO Text Generation • 7B • Updated Jun 28, 2024 • 90 • 2 xinlai/Qwen2-7B-Instruct-Step-DPO Text Generation • 8B • Updated Jun 29, 2024 • 203 • 3 xinlai/Qwen2-72B-Instruct-Step-DPO Text Generation • 73B • Updated Jun 28, 2024 xinlai/DeepSeekMath-Base-SFT-Step-DPO Text Generation • 7B • Updated Jun 28, 2024 • 13