TehVenom
/

MPT-7b-Chat-Instruct-LongCTX-Merge

Text Generation

text-generation-inference

Model card Files Files and versions

TehVenom commited on May 7, 2023

Commit

e5e3578

·

1 Parent(s): 023a9a6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ For a final model composed of:
 ----
 This was done for the sake of testing the theory of how 'long context' tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
-Different from the first merge [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Instruct base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Instruct.
 There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size,
 and to try and make the base Chat model less dry, and slightly more fun / verbose, and intelligent by adding the literature / Instruct based models into it.

 ----
 This was done for the sake of testing the theory of how 'long context' tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
+Different from the first merges [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Chat base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Chatting.
 There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size,
 and to try and make the base Chat model less dry, and slightly more fun / verbose, and intelligent by adding the literature / Instruct based models into it.