Update README.md
Browse files
README.md
CHANGED
|
@@ -8,18 +8,18 @@ language:
|
|
| 8 |
|
| 9 |
This model is pretrained as a reference baseline to the Based model provided here: https://huggingface.co/hazyresearch/based-1b-50b.
|
| 10 |
|
| 11 |
-
Both checkpoints are pretrained on **50Bn tokens
|
| 12 |
|
| 13 |
A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
|
| 14 |
|
| 15 |
|
| 16 |
### Model Sources
|
| 17 |
|
| 18 |
-
The model is a standard Mamba model using the code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
|
| 19 |
|
| 20 |
-
The training code is provided here
|
| 21 |
|
| 22 |
-
The paper for
|
| 23 |
|
| 24 |
|
| 25 |
### Uses
|
|
|
|
| 8 |
|
| 9 |
This model is pretrained as a reference baseline to the Based model provided here: https://huggingface.co/hazyresearch/based-1b-50b.
|
| 10 |
|
| 11 |
+
Both checkpoints are pretrained on **50Bn tokens** of the Pile in the exact same data order using next token prediction.
|
| 12 |
|
| 13 |
A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
|
| 14 |
|
| 15 |
|
| 16 |
### Model Sources
|
| 17 |
|
| 18 |
+
The model is a standard Mamba model using the model code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
|
| 19 |
|
| 20 |
+
The training code is provided here and can be used to reproduce training: https://github.com/HazyResearch/based
|
| 21 |
|
| 22 |
+
The paper for the work is here, and the appendix includes additional experimental details/hyperparameters: https://arxiv.org/abs/2402.18668
|
| 23 |
|
| 24 |
|
| 25 |
### Uses
|