tpo-alignment
/

Mistral-Instruct-7B-TPO-y4

alignment-handbook

Generated from Trainer

Model card Files Files and versions

sahsaeedi commited on Feb 19

Commit

60721f9

·

verified ·

1 Parent(s): a6a94bb

Update README.md

Files changed (1) hide show

README.md +1 -11

README.md CHANGED Viewed

@@ -75,20 +75,10 @@ We used 8xA100 GPUs for model training.
-## Citations
 TPO paper:
 ```
-@article{meng2024simpo,
-  title={{SimPO}: Simple preference optimization with a reference-free reward},
-  author={Meng, Yu and Xia, Mengzhou and Chen, Danqi},
-  journal={arXiv preprint arXiv:2405.14734},
-  year={2024}
-}
-```
-UltraFeedback paper:
-```
 @misc{saeidi2025triplepreferenceoptimizationachieving,
       title={Triple Preference Optimization: Achieving Better Alignment using a Single Step Optimization},
       author={Amir Saeidi and Shivanshu Verma and Aswin RRV and Kashif Rasul and Chitta Baral},

+## Citation
 TPO paper:
 ```
 @misc{saeidi2025triplepreferenceoptimizationachieving,
       title={Triple Preference Optimization: Achieving Better Alignment using a Single Step Optimization},
       author={Amir Saeidi and Shivanshu Verma and Aswin RRV and Kashif Rasul and Chitta Baral},