GhostWriterLlama-3.2-1B-DPO

Developed by: Ahmed Shahriar Sakib
License: Apache 2.0
Finetuned from model: ahmedshahriar/GhostWriterLlama-3.2-1B (SFT)
Fine-tuning datasets: ahmedshahriar/llmGhostWriter-dpo (preference pairs), ahmedshahriar/llmGhostWriter (SFT)
Evaluation results dataset: ahmedshahriar/GhostWriterLlama-3.2-1B-DPO-results
Use-case: Writing/ghost-writing assistant aligned via DPO to generate more preferred text.

Description

GhostWriterLlama-3.2-1B-DPO is a preference-aligned variant of ahmedshahriar/GhostWriterLlama-3.2-1B trained with Direct Preference Optimization (DPO) on prompt–(chosen, rejected) pairs to improve stylistic alignment and preference adherence for ghostwriting-style tasks. Training used the Unsloth workflow with Hugging Face TRL’s DPOTrainer for efficiency and compatibility with the Transformers ecosystem.

Training Details

Method: SFT → DPO (preference optimization with chosen vs. rejected responses)
Frameworks: Unsloth + Hugging Face TRL (DPOTrainer)
Data: llmGhostWriter (~2.1k instruction–response) for SFT; llmGhostWriter-dpo (~1.2k prompt–preference pairs) for DPO fine-tuning
Objective: Improve preference alignment and writing style suitability for blogs/social content
Infrastructure & Resources: This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Intended Use

Text-generation scenarios needing coherent, on-tone writing: blog drafts, marketing copy, and expository/creative assistance.

Out-of-Scope and Risks

Not for factual retrieval or high-stakes decisions; outputs may contain errors.
Alignment reflects dataset style/bias; multi-turn chat/tool-use not explicitly optimized.

Evaluation

Evaluated with an LLM-as-a-judge setup on the test split of the ahmedshahriar/llmGhostWriter dataset. A judge model (GPT-4.1-nano) scored each generated response on a 1–3 scale for two criteria:

Accuracy — factual correctness and completeness
Style — appropriateness of tone for blog/social content (non-academic)

Quantitative averages from these scores, combined with qualitative review, indicated better factual accuracy and improved stylistic alignment over the base fine-tuned model ahmedshahriar/GhostWriterLlama-3.2-1B, approaching the performance of the strong Llama-3.2-1B-Instruct baseline in ghostwriting/expository prompts. (Caveat: LLM-judge metrics can reflect the judge’s biases; consider complementary human review for critical use.)

Evaluation results are publicly available for transparency at: ahmedshahriar/GhostWriterLlama-3.2-1B-DPO-results.

Notebooks

DPO Fine-tuning
Evaluation

Limitations

Small corpora; domain diversity limited.
Preference alignment ≠ guaranteed factuality.
English only.

Citation

@misc{ahmedshahriar_ghostwriterllama3_2_1b_dpo_2025,
  author       = {Ahmed Shahriar Sakib},
  title        = {GhostWriterLlama-3.2-1B-DPO},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ahmedshahriar/GhostWriterLlama-3.2-1B-DPO}},
  license      = {Apache-2.0}
}

Acknowledgements

Thanks to Unsloth for efficient fine-tuning and to the authors of publicly available articles used to build training corpora.

Downloads last month: 34

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for ahmedshahriar/GhostWriterLlama-3.2-1B-DPO

Base model

meta-llama/Llama-3.2-1B

Finetuned

unsloth/Llama-3.2-1B

Adapter

ahmedshahriar/GhostWriterLlama-3.2-1B

Adapter

(3)

this model

Adapters

2 models

Datasets used to train ahmedshahriar/GhostWriterLlama-3.2-1B-DPO

Evaluation results

Accuracy (LLM-judge, 1–3) on ahmedshahriar/llmGhostWriter (test)
test set Evaluation notebook (LLM-as-a-judge)

2.480
Style (LLM-judge, 1–3) on ahmedshahriar/llmGhostWriter (test)
test set Evaluation notebook (LLM-as-a-judge)

2.950

View on Papers With Code