| license: apache-2.0 | |
| datasets: | |
| - PKU-Alignment/align-anything | |
| base_model: | |
| - Qwen/Qwen2.5-0.5B-Instruct | |
| DPO training is performed using the [Align-Anything](https://github.com/PKU-Alignment/align-anything) framework, with the *PKU-Alignment/align-anything* text-to-text dataset. | |
| DPO training report: https://api.wandb.ai/links/nlp-amct/uifw66p5 |