Submission DpoTask (LoRA)
Fine-tuned using DPO on a preference dataset (prompt/chosen/rejected) with LoRA r=8 on CPU.
- SHA256:
266b32d824c7c565442e12e4570d6c65df04ae7afe7f75b5bbccb36d85418377
- Training: 1 epoch, batch size 1
- Upload time: 2025-07-18T21:54:46+02:00
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = PeftModel.from_pretrained(base, "raniero/submission_dpo_ok_001")
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for raniero/submission_dpo_ok_001
Base model
meta-llama/Llama-2-7b-hf