dq-gpt2-instruct
A recreation of the GPT-2 model from scretch
Model Description
Instruction fine-tuned GPT-2 model.
- Architecture: GPT-2
- Parameters: ~163,109,376
- Layers: 12
- Hidden Size: 768
- Attention Heads: 12
- Context Length: 1024
- Vocabulary Size: 50304
- Precision: bfloat16
- Dataset: HuggingFaceFW/fineweb-edu
Usage
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model = GPT2LMHeadModel.from_pretrained("csabakecskemeti/dq-gpt2-instruct-exp1")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2") # Use standard GPT-2 tokenizer
# Generate text with instruction format
prompt = "### Instruction:\nWhat is Python?\n\n### Response:\n"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training
Model has pretrained on the fineweb-edu dataset and fine-tuned on the Alpaca GPT-4 dataset for instruction following.
Pretraining
Instruct fine tuning
Training hardware
DGX Spark
- pretraining ~19 days
- instruct fine tuning ~2 hrs
- Downloads last month
- 36
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support


