Yet Another New Multimodal Fine-Tuning Recipe š„§
š§āš³ In this @HuggingFace Face Cookbook notebook, we demonstrate how to align a multimodal model (VLM) using Mixed Preference Optimization (MPO) using trl.
š” This recipe is powered by the new MPO support in trl, enabled through a recent upgrade to the DPO trainer!
We align the multimodal model using multiple optimization objectives (losses), guided by a preference dataset (chosen vs. rejected multimodal pairs).