Safetensors
gpt2

dq-gpt2-instruct

A recreation of the GPT-2 model from scretch

Model Description

Instruction fine-tuned GPT-2 model.

  • Architecture: GPT-2
  • Parameters: ~163,109,376
  • Layers: 12
  • Hidden Size: 768
  • Attention Heads: 12
  • Context Length: 1024
  • Vocabulary Size: 50304
  • Precision: bfloat16
  • Dataset: HuggingFaceFW/fineweb-edu

Usage

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("csabakecskemeti/dq-gpt2-instruct-exp1")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")  # Use standard GPT-2 tokenizer

# Generate text with instruction format
prompt = "### Instruction:\nWhat is Python?\n\n### Response:\n"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training

Model has pretrained on the fineweb-edu dataset and fine-tuned on the Alpaca GPT-4 dataset for instruction following.

Pretraining

dq-gpt2-loss

Instruct fine tuning

dq-gpt2-ft-loss

Training hardware

sparky_small_dq

DGX Spark

  • pretraining ~19 days
  • instruct fine tuning ~2 hrs
Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train csabakecskemeti/dq-gpt2-instruct-exp1