metadata
license: cc-by-nc-4.0
A model made with curated synthetic data and then KTO'd on a small curated set. Total time to train was 4 H100 hours. I quite like the results this gave despite the dataset sizes involved. Its also a lot cheaper to iterate. I plan to hand review the human data I've been using and slowly work that back into the datamix. Additionally, planning to make a focused instruction following KTO set to improve system prompt adherance and steerability.
Use chatML and minP.