hazyresearch
/

mamba-1b-50b

Model card Files Files and versions

simarora commited on Apr 20, 2024

Commit

3115c36

·

verified ·

1 Parent(s): ee991dd

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -8,18 +8,18 @@ language:
 This model is pretrained as a reference baseline to the Based model provided here: https://huggingface.co/hazyresearch/based-1b-50b.
-Both checkpoints are pretrained on **50Bn tokens*** of the Pile in the exact same data order using next token prediction.
 A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
 ### Model Sources
-The model is a standard Mamba model using the code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
-The training code is provided here for reproducing training: https://github.com/HazyResearch/based
-The paper for this work is here, and includes additional training details: https://arxiv.org/abs/2402.18668
 ### Uses

 This model is pretrained as a reference baseline to the Based model provided here: https://huggingface.co/hazyresearch/based-1b-50b.
+Both checkpoints are pretrained on **50Bn tokens** of the Pile in the exact same data order using next token prediction.
 A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
 ### Model Sources
+The model is a standard Mamba model using the model code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
+The training code is provided here and can be used to reproduce training: https://github.com/HazyResearch/based
+The paper for the work is here, and the appendix includes additional experimental details/hyperparameters: https://arxiv.org/abs/2402.18668
 ### Uses