GhostWriterLlama-3.2-1B-DPO

  • Developed by: Ahmed Shahriar Sakib
  • License: Apache 2.0
  • Finetuned from model: ahmedshahriar/GhostWriterLlama-3.2-1B (SFT)
  • Fine-tuning datasets: ahmedshahriar/llmGhostWriter-dpo (preference pairs), ahmedshahriar/llmGhostWriter (SFT)
  • Evaluation results dataset: ahmedshahriar/GhostWriterLlama-3.2-1B-DPO-results
  • Use-case: Writing/ghost-writing assistant aligned via DPO to generate more preferred text.

Description

GhostWriterLlama-3.2-1B-DPO is a preference-aligned variant of ahmedshahriar/GhostWriterLlama-3.2-1B trained with Direct Preference Optimization (DPO) on prompt–(chosen, rejected) pairs to improve stylistic alignment and preference adherence for ghostwriting-style tasks. Training used the Unsloth workflow with Hugging Face TRL’s DPOTrainer for efficiency and compatibility with the Transformers ecosystem.

Training Details

  • Method: SFT → DPO (preference optimization with chosen vs. rejected responses)
  • Frameworks: Unsloth + Hugging Face TRL (DPOTrainer)
  • Data: llmGhostWriter (~2.1k instruction–response) for SFT; llmGhostWriter-dpo (~1.2k prompt–preference pairs) for DPO fine-tuning
  • Objective: Improve preference alignment and writing style suitability for blogs/social content
  • Infrastructure & Resources: This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Intended Use

Text-generation scenarios needing coherent, on-tone writing: blog drafts, marketing copy, and expository/creative assistance.

Out-of-Scope and Risks

  • Not for factual retrieval or high-stakes decisions; outputs may contain errors.
  • Alignment reflects dataset style/bias; multi-turn chat/tool-use not explicitly optimized.

Evaluation

Evaluated with an LLM-as-a-judge setup on the test split of the ahmedshahriar/llmGhostWriter dataset. A judge model (GPT-4.1-nano) scored each generated response on a 1–3 scale for two criteria:

  • Accuracy — factual correctness and completeness
  • Style — appropriateness of tone for blog/social content (non-academic)

Quantitative averages from these scores, combined with qualitative review, indicated better factual accuracy and improved stylistic alignment over the base fine-tuned model ahmedshahriar/GhostWriterLlama-3.2-1B, approaching the performance of the strong Llama-3.2-1B-Instruct baseline in ghostwriting/expository prompts. (Caveat: LLM-judge metrics can reflect the judge’s biases; consider complementary human review for critical use.)

Evaluation results are publicly available for transparency at: ahmedshahriar/GhostWriterLlama-3.2-1B-DPO-results.

Notebooks

  • DPO Fine-tuning
    Open In Colab
  • Evaluation
    Open In Colab

Limitations

  • Small corpora; domain diversity limited.
  • Preference alignment ≠ guaranteed factuality.
  • English only.

Citation

@misc{ahmedshahriar_ghostwriterllama3_2_1b_dpo_2025,
  author       = {Ahmed Shahriar Sakib},
  title        = {GhostWriterLlama-3.2-1B-DPO},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ahmedshahriar/GhostWriterLlama-3.2-1B-DPO}},
  license      = {Apache-2.0}
}

Acknowledgements

Thanks to Unsloth for efficient fine-tuning and to the authors of publicly available articles used to build training corpora.

Downloads last month
34
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ahmedshahriar/GhostWriterLlama-3.2-1B-DPO

Adapter
(3)
this model
Adapters
2 models

Datasets used to train ahmedshahriar/GhostWriterLlama-3.2-1B-DPO

Evaluation results