Update README.md
Browse files
README.md
CHANGED
|
@@ -20,4 +20,10 @@ datasets:
|
|
| 20 |
|
| 21 |
Wordcel is a Llama3 fine-tune intended to be used as a mid-training checkpoint for more specific chat/RP/storywriting applications.
|
| 22 |
|
| 23 |
-
It has been trained from Llama3 8B Base on a composite dataset that highlights reasoning, (uncensored) stories, classic literature, and assorted interpersonal intelligence tasks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
Wordcel is a Llama3 fine-tune intended to be used as a mid-training checkpoint for more specific chat/RP/storywriting applications.
|
| 22 |
|
| 23 |
+
It has been trained from Llama3 8B Base on a composite dataset that highlights reasoning, (uncensored) stories, classic literature, and assorted interpersonal intelligence tasks.
|
| 24 |
+
|
| 25 |
+
Components of the composite dataset include [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5), and [Grimulkan](https://huggingface.co/grimulkan)'s [Theory of Mind](https://huggingface.co/datasets/grimulkan/theory-of-mind) and [Physical Reasoning](https://huggingface.co/datasets/grimulkan/physical-reasoning) datasets.
|
| 26 |
+
|
| 27 |
+
It is trained at a context length of 32k tokens, using linear RoPE scaling with a factor of 4.0. Derivative models should be capable of generalizing to 32k tokens as a result.
|
| 28 |
+
|
| 29 |
+
If you train a model using this checkpoint, please give clear attribution! The Llama 3 base license likely applies.
|