zhezi12138
/

alpaca-7b-iter-3-mixp

Model card Files Files and versions

zhezi12138 commited on Jan 16

Commit

574b181

·

verified ·

1 Parent(s): aad0ac5

Create README.md

Files changed (1) hide show

README.md +10 -0

README.md ADDED Viewed

	@@ -0,0 +1,10 @@

+---
+license: mit
+datasets:
+- PKU-Alignment/BeaverTails
+language:
+- en
+base_model:
+- PKU-Alignment/alpaca-7b-reproduced
+---
+This model is for the reproduction of results on Safe-RLHF dataset of paper "The crucial role of samplers in online direct preference optimization". Iteration 3 of DPO-mixp algorithm, trained on https://huggingface.co/zhezi12138/alpaca-7b-iter-2-mixp.