|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
tags: |
|
|
- text-generation |
|
|
- llama |
|
|
- tinystories |
|
|
- storytelling |
|
|
datasets: |
|
|
- roneneldan/TinyStories |
|
|
widget: |
|
|
- text: "Once upon a time, there was a" |
|
|
example_title: "Story Beginning" |
|
|
- text: "One day, Lily met a" |
|
|
example_title: "Character Introduction" |
|
|
- text: "The little boy was very happy because" |
|
|
example_title: "Story Continuation" |
|
|
--- |
|
|
|
|
|
# TinyStories Llama Model |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a small Llama-architecture language model trained on the [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories). |
|
|
The model is designed to generate simple, coherent children's stories using a vocabulary and concepts that a typical 3-4 year old would understand. |
|
|
|
|
|
**Model Architecture:** Llama 2 |
|
|
**Training Framework:** PyTorch |
|
|
**Implementation:** Based on [llama2.c](https://github.com/karpathy/llama2.c) |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Architecture Hyperparameters |
|
|
|
|
|
- **Dimension:** 288 |
|
|
- **Number of Layers:** 6 |
|
|
- **Number of Attention Heads:** 6 |
|
|
- **Number of KV Heads:** 6 |
|
|
- **Vocabulary Size:** 32,000 (Llama 2 tokenizer) |
|
|
- **Maximum Sequence Length:** 256 tokens |
|
|
- **Dropout:** 0.0 |
|
|
- **Hidden Dimension Multiple:** 32 |
|
|
|
|
|
**Total Parameters:** ~15M |
|
|
|
|
|
### Training Hyperparameters |
|
|
|
|
|
- **Batch Size:** 128 (micro-batch) |
|
|
- **Gradient Accumulation Steps:** 4 |
|
|
- **Effective Batch Size:** 512 |
|
|
- **Learning Rate:** 5e-4 (max) |
|
|
- **Learning Rate Schedule:** Cosine decay with warmup |
|
|
- **Warmup Iterations:** 1,000 |
|
|
- **Total Training Iterations:** 100,000 |
|
|
- **Weight Decay:** 0.1 |
|
|
- **Beta1:** 0.9 |
|
|
- **Beta2:** 0.95 |
|
|
- **Gradient Clipping:** 1.0 |
|
|
- **Optimizer:** AdamW |
|
|
- **Precision:** bfloat16 (with mixed precision training) |
|
|
|
|
|
**Tokens per Iteration:** ~65,536 (4 grad accum × 1 process × 64 batch × 256 seq len) |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is intended for: |
|
|
- Generating simple children's stories |
|
|
- Educational demonstrations of small-scale language model training |
|
|
- Research into emergent capabilities in small language models |
|
|
- Experimentation with efficient inference (e.g., pure C implementation) |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Domain-Specific:** The model is trained exclusively on simple stories and will not perform well on general text generation tasks |
|
|
- **Vocabulary:** Limited to concepts and language appropriate for very young children |
|
|
- **Context Length:** Maximum sequence length of 256 tokens limits story length |
|
|
- **No Instruction Following:** This is a base model without instruction tuning |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on the [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories), which consists of short stories generated to contain only words that a typical 3-4 year old would understand. The dataset was created to study the capabilities of small language models. |
|
|
|
|
|
**Dataset Size:** ~2.1M stories |
|
|
**Vocabulary:** Words understandable by 3-4 year olds |
|
|
**Content:** Simple narratives, common objects, basic emotions and actions |
|
|
|
|
|
## Example Outputs |
|
|
|
|
|
**Prompt:** "Once upon a time, there was a little girl named Lily." |
|
|
|
|
|
**Generation (temperature=0.8, top_p=0.9):** |
|
|
``` |
|
|
She loved to play outside in the park. One day, she saw a big, red ball. |
|
|
She wanted to play with it, but it was too high. Lily's mom said, "Let's |
|
|
go get it together!" They worked together and got the ball down. Lily was |
|
|
so happy! She played with the ball all day long. |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model or the llama2.c implementation, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{llama2c, |
|
|
author = {Andrej Karpathy}, |
|
|
title = {llama2.c: Inference Llama 2 in one file of pure C}, |
|
|
year = {2023}, |
|
|
publisher = {GitHub}, |
|
|
url = {https://github.com/karpathy/llama2.c} |
|
|
} |
|
|
|
|
|
@article{eldan2023tinystories, |
|
|
title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?}, |
|
|
author={Eldan, Ronen and Li, Yuanzhi}, |
|
|
journal={arXiv preprint arXiv:2305.07759}, |
|
|
year={2023} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License - See the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Model architecture and training code adapted from [llama2.c](https://github.com/karpathy/llama2.c) by Andrej Karpathy |
|
|
- Trained on the [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories) by Ronen Eldan and Yuanzhi Li |
|
|
- Based on the Llama 2 architecture by Meta AI |
|
|
|