Update README.md
Browse files
README.md
CHANGED
|
@@ -9,23 +9,21 @@ license: mit
|
|
| 9 |
metrics:
|
| 10 |
- accuracy
|
| 11 |
---
|
| 12 |
-
#
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
The model is
|
| 17 |
|
| 18 |
-
## Training corpora
|
| 19 |
|
| 20 |
-
|
| 21 |
|
|
|
|
| 22 |
|
| 23 |
-
With the Tokenizers library, I created a 52K BPE vocab based on the training corpus.
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
|
| 28 |
-
https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars
|
| 29 |
|
| 30 |
|
| 31 |
|
|
@@ -35,16 +33,17 @@ The model itself can be used in this way:
|
|
| 35 |
|
| 36 |
``` python
|
| 37 |
from transformers import AutoTokenizer, AutoModelWithLMHead
|
| 38 |
-
tokenizer = AutoTokenizer.from_pretrained("ahmet1338/
|
| 39 |
-
model = AutoModelWithLMHead.from_pretrained("ahmet1338/
|
| 40 |
```
|
| 41 |
|
| 42 |
-
|
|
|
|
| 43 |
|
| 44 |
``` python
|
| 45 |
from transformers import pipeline
|
| 46 |
-
pipe = pipeline('text-generation', model="ahmet1338/
|
| 47 |
-
tokenizer="ahmet1338/
|
| 48 |
text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
|
| 49 |
print(text)
|
| 50 |
```
|
|
|
|
| 9 |
metrics:
|
| 10 |
- accuracy
|
| 11 |
---
|
| 12 |
+
# Turkish GPT-2 Model (Experimental)
|
| 13 |
|
| 14 |
+
I've made available a GPT-2 model for Turkish that I trained on a variety of texts.
|
| 15 |
|
| 16 |
+
The model is intended to serve as a starting point for text-specific adjustments.
|
| 17 |
|
|
|
|
| 18 |
|
| 19 |
+
## Training Source
|
| 20 |
|
| 21 |
+
I used a Turkish corpus that is taken from different written and oral sources.
|
| 22 |
|
|
|
|
| 23 |
|
| 24 |
+
I developed a LLM model with 50k vocabulary using the Custom Tokenizers library using the training resources.
|
| 25 |
|
| 26 |
+
I could train the GPT-2 for Turkish using the entire training corpus (ten epochs) after developing the vocabulary.
|
|
|
|
| 27 |
|
| 28 |
|
| 29 |
|
|
|
|
| 33 |
|
| 34 |
``` python
|
| 35 |
from transformers import AutoTokenizer, AutoModelWithLMHead
|
| 36 |
+
tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt-2-experimental")
|
| 37 |
+
model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt-2-experimental")
|
| 38 |
```
|
| 39 |
|
| 40 |
+
|
| 41 |
+
To generating text, we can use these lines of code which is Transformers Pipelines:
|
| 42 |
|
| 43 |
``` python
|
| 44 |
from transformers import pipeline
|
| 45 |
+
pipe = pipeline('text-generation', model="ahmet1338/gpt-2-experimental",
|
| 46 |
+
tokenizer="ahmet1338/gpt-2-experimental", config={'max_length':800})
|
| 47 |
text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
|
| 48 |
print(text)
|
| 49 |
```
|