Submission DpoTask (LoRA)

Fine-tuned using DPO on a preference dataset (prompt/chosen/rejected) with LoRA r=8 on CPU.

  • SHA256: 266b32d824c7c565442e12e4570d6c65df04ae7afe7f75b5bbccb36d85418377
  • Training: 1 epoch, batch size 1
  • Upload time: 2025-07-18T21:54:46+02:00

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = PeftModel.from_pretrained(base, "raniero/submission_dpo_ok_001")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for raniero/submission_dpo_ok_001

Finetuned
(1153)
this model