Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -22,7 +22,7 @@ This is a GPT-2 language model trained from scratch on Assamese monolingual text
|
|
| 22 |
|
| 23 |
## 📖 Model Description
|
| 24 |
|
| 25 |
-
The Assamese GPT-2 model is based on the standard GPT-2 decoder-only transformer architecture. It is capable of generating grammatically coherent and contextually relevant Assamese text and serves as a foundation for downstream NLP tasks such as:
|
| 26 |
|
| 27 |
- Language modeling
|
| 28 |
- Text completion/generation
|
|
@@ -55,6 +55,9 @@ Data preprocessing included:
|
|
| 55 |
## 🧪 Training Procedure
|
| 56 |
|
| 57 |
### Hyperparameters
|
|
|
|
|
|
|
|
|
|
| 58 |
- Learning rate: 5e-5
|
| 59 |
- Epochs: 20
|
| 60 |
- Batch size: 64
|
|
|
|
| 22 |
|
| 23 |
## 📖 Model Description
|
| 24 |
|
| 25 |
+
The Assamese GPT-2 model is based on the standard GPT-2 decoder-only transformer architecture with 12 layers, 12 attention heads, 768 hidden size. It is capable of generating grammatically coherent and contextually relevant Assamese text and serves as a foundation for downstream NLP tasks such as:
|
| 26 |
|
| 27 |
- Language modeling
|
| 28 |
- Text completion/generation
|
|
|
|
| 55 |
## 🧪 Training Procedure
|
| 56 |
|
| 57 |
### Hyperparameters
|
| 58 |
+
- Architecture: GPT2 (12 layers, 12 heads, 768 hidden size)
|
| 59 |
+
- Tokenizer vocab size: 50,000
|
| 60 |
+
- Context window size: 1024 tokens
|
| 61 |
- Learning rate: 5e-5
|
| 62 |
- Epochs: 20
|
| 63 |
- Batch size: 64
|