Quantization made by Richard Erkhov.
scaling-vocab-3b-32k-overtrain - EXL2
- Model creator: https://huggingface.co/sail/
- Original model: https://huggingface.co/sail/scaling-vocab-3b-32k-overtrain/
Available sizes
Branch | Bits | Description |
---|---|---|
8_0 | 8.0 | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
6_5 | 6.5 | Very similar to 8.0, good tradeoff of size vs performance, recommended. |
5_0 | 5.0 | Slightly lower quality vs 6.5, but usable |
4_25 | 4.25 | GPTQ equivalent bits per weight, slightly higher quality. |
3_5 | 3.5 | Lower quality, only use if you have to. |
Download instructions
With git:
git clone --single-branch --branch 6_5 https://huggingface.co/sail_-_scaling-vocab-3b-32k-overtrain-exl2 scaling-vocab-3b-32k-overtrain-6_5
With huggingface hub:
pip3 install huggingface-hub
To download a specific branch, use the --revision
parameter. For example, to download the 6.5 bpw branch:
Linux:
huggingface-cli download sail_-_scaling-vocab-3b-32k-overtrain-exl2 --revision 6_5 --local-dir scaling-vocab-3b-32k-overtrain-6_5 --local-dir-use-symlinks False
Windows (which apparently doesn't like _ in folders sometimes?):
huggingface-cli download sail_-_scaling-vocab-3b-32k-overtrain-exl2 --revision 6_5 --local-dir scaling-vocab-3b-32k-overtrain-6.5 --local-dir-use-symlinks False
Original model description:
datasets: - cerebras/SlimPajama-627B language: - en
The pre-trained 3B model with the vocabulary size 43K in the paper Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies. We investigate how vocabulary size impacts language model scaling law in this paper.
Based on our approach, we predict the optimal vocabulary size for 3B model is about 43K. Then, we train a Llama-based 3B model on a sampled version Slimpajama datasets. The model with 43K vocabulary outperforms the model with the common vocabulary size, 32K, despite using fewer training tokens. It is noteworthy that the proposed approach can be used for different model sizes.