Update README.md
Browse files
README.md
CHANGED
|
@@ -3,13 +3,15 @@ license: mit
|
|
| 3 |
datasets:
|
| 4 |
- tiiuae/falcon-refinedweb
|
| 5 |
- HuggingFaceFW/fineweb
|
|
|
|
|
|
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
pipeline_tag: text-generation
|
| 9 |
library_name: PyTorch
|
| 10 |
---
|
| 11 |
|
| 12 |
-
## A deep and narrow
|
| 13 |
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.
|
| 14 |
|
| 15 |
It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.
|
|
|
|
| 3 |
datasets:
|
| 4 |
- tiiuae/falcon-refinedweb
|
| 5 |
- HuggingFaceFW/fineweb
|
| 6 |
+
base_model:
|
| 7 |
+
- cckm/tinymistral_950m
|
| 8 |
language:
|
| 9 |
- en
|
| 10 |
pipeline_tag: text-generation
|
| 11 |
library_name: PyTorch
|
| 12 |
---
|
| 13 |
|
| 14 |
+
## A deep and narrow Mistral model (950M params)
|
| 15 |
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.
|
| 16 |
|
| 17 |
It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.
|