Update README.md
Browse files
README.md
CHANGED
|
@@ -39,7 +39,7 @@ This is a merge of pre-trained language models created using [mergekit](https://
|
|
| 39 |
From there, each of the four threads was separately task-tuned on 2 datasets each.
|
| 40 |
Various methods of combining those via merge were tested, with this one scoring highest on EQ-Bench as an indicator.
|
| 41 |
|
| 42 |
-
My understanding of the Model Stock merge method is that it
|
| 43 |
I have hope that the adaptation, especially over two stages, is still sufficient to aid in longer contexts and multi-turn conversations from the ancestor models, and add some individual style while retaining a fair amount of their capability.
|
| 44 |
|
| 45 |
This model's refusals are ... not nonexistent, but certainly don't rely on them.
|
|
|
|
| 39 |
From there, each of the four threads was separately task-tuned on 2 datasets each.
|
| 40 |
Various methods of combining those via merge were tested, with this one scoring highest on EQ-Bench as an indicator.
|
| 41 |
|
| 42 |
+
My understanding of the Model Stock merge method is that it reduces task adaptation to a significant degree, but also significantly limits forgetting caused by training.
|
| 43 |
I have hope that the adaptation, especially over two stages, is still sufficient to aid in longer contexts and multi-turn conversations from the ancestor models, and add some individual style while retaining a fair amount of their capability.
|
| 44 |
|
| 45 |
This model's refusals are ... not nonexistent, but certainly don't rely on them.
|