BharatVLM commited on
Commit
3df5c20
·
verified ·
1 Parent(s): 1ef8ffa

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -22,7 +22,7 @@ This is a GPT-2 language model trained from scratch on Assamese monolingual text
22
 
23
  ## 📖 Model Description
24
 
25
- The Assamese GPT-2 model is based on the standard GPT-2 decoder-only transformer architecture. It is capable of generating grammatically coherent and contextually relevant Assamese text and serves as a foundation for downstream NLP tasks such as:
26
 
27
  - Language modeling
28
  - Text completion/generation
@@ -55,6 +55,9 @@ Data preprocessing included:
55
  ## 🧪 Training Procedure
56
 
57
  ### Hyperparameters
 
 
 
58
  - Learning rate: 5e-5
59
  - Epochs: 20
60
  - Batch size: 64
 
22
 
23
  ## 📖 Model Description
24
 
25
+ The Assamese GPT-2 model is based on the standard GPT-2 decoder-only transformer architecture with 12 layers, 12 attention heads, 768 hidden size. It is capable of generating grammatically coherent and contextually relevant Assamese text and serves as a foundation for downstream NLP tasks such as:
26
 
27
  - Language modeling
28
  - Text completion/generation
 
55
  ## 🧪 Training Procedure
56
 
57
  ### Hyperparameters
58
+ - Architecture: GPT2 (12 layers, 12 heads, 768 hidden size)
59
+ - Tokenizer vocab size: 50,000
60
+ - Context window size: 1024 tokens
61
  - Learning rate: 5e-5
62
  - Epochs: 20
63
  - Batch size: 64