GhostWriterLlama-3.2-1B-DPO
- Developed by: Ahmed Shahriar Sakib
- License: Apache 2.0
- Finetuned from model: ahmedshahriar/GhostWriterLlama-3.2-1B (SFT)
- Fine-tuning datasets:
ahmedshahriar/llmGhostWriter-dpo(preference pairs),ahmedshahriar/llmGhostWriter(SFT) - Evaluation results dataset:
ahmedshahriar/GhostWriterLlama-3.2-1B-DPO-results - Use-case: Writing/ghost-writing assistant aligned via DPO to generate more preferred text.
Description
GhostWriterLlama-3.2-1B-DPO is a preference-aligned variant of ahmedshahriar/GhostWriterLlama-3.2-1B trained with Direct Preference Optimization (DPO) on prompt–(chosen, rejected) pairs to improve stylistic alignment and preference adherence for ghostwriting-style tasks. Training used the Unsloth workflow with Hugging Face TRL’s DPOTrainer for efficiency and compatibility with the Transformers ecosystem.
Training Details
- Method: SFT → DPO (preference optimization with chosen vs. rejected responses)
- Frameworks: Unsloth + Hugging Face TRL (DPOTrainer)
- Data:
llmGhostWriter(~2.1k instruction–response) for SFT;llmGhostWriter-dpo(~1.2k prompt–preference pairs) for DPO fine-tuning - Objective: Improve preference alignment and writing style suitability for blogs/social content
- Infrastructure & Resources: This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Intended Use
Text-generation scenarios needing coherent, on-tone writing: blog drafts, marketing copy, and expository/creative assistance.
Out-of-Scope and Risks
- Not for factual retrieval or high-stakes decisions; outputs may contain errors.
- Alignment reflects dataset style/bias; multi-turn chat/tool-use not explicitly optimized.
Evaluation
Evaluated with an LLM-as-a-judge setup on the test split of the ahmedshahriar/llmGhostWriter dataset.
A judge model (GPT-4.1-nano) scored each generated response on a 1–3 scale for two criteria:
- Accuracy — factual correctness and completeness
- Style — appropriateness of tone for blog/social content (non-academic)
Quantitative averages from these scores, combined with qualitative review, indicated better factual accuracy and improved stylistic alignment over the base fine-tuned model ahmedshahriar/GhostWriterLlama-3.2-1B, approaching the performance of the strong Llama-3.2-1B-Instruct baseline in ghostwriting/expository prompts. (Caveat: LLM-judge metrics can reflect the judge’s biases; consider complementary human review for critical use.)
Evaluation results are publicly available for transparency at: ahmedshahriar/GhostWriterLlama-3.2-1B-DPO-results.
Notebooks
Limitations
- Small corpora; domain diversity limited.
- Preference alignment ≠ guaranteed factuality.
- English only.
Citation
@misc{ahmedshahriar_ghostwriterllama3_2_1b_dpo_2025,
author = {Ahmed Shahriar Sakib},
title = {GhostWriterLlama-3.2-1B-DPO},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ahmedshahriar/GhostWriterLlama-3.2-1B-DPO}},
license = {Apache-2.0}
}
Acknowledgements
Thanks to Unsloth for efficient fine-tuning and to the authors of publicly available articles used to build training corpora.
- Downloads last month
- 34
Model tree for ahmedshahriar/GhostWriterLlama-3.2-1B-DPO
Datasets used to train ahmedshahriar/GhostWriterLlama-3.2-1B-DPO
Evaluation results
- Accuracy (LLM-judge, 1–3) on ahmedshahriar/llmGhostWriter (test)test set Evaluation notebook (LLM-as-a-judge)2.480
- Style (LLM-judge, 1–3) on ahmedshahriar/llmGhostWriter (test)test set Evaluation notebook (LLM-as-a-judge)2.950
