|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# gated-deltanet-swa-0.4B-10B |
|
|
|
|
|
Gated DeltaNet + sliding-window attention (0.4B params, 10B tokens) |
|
|
|
|
|
## Overview |
|
|
|
|
|
* **Training**: gated-deltanet-swa-0.4B-10B was trained on [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu), which is realeased under [ODC-By v1.0](https://opendatacommons.org/licenses/by/1-0/) |
|
|
* **Parameters**: 0.4B |
|
|
* **Task**: Language modeling |
|
|
* **Framework**: HuggingFace, [flash-linear-attention](https://github.com/fla-org/flash-linear-attention) |
|
|
* **Output structure**: [batch_size, sequence_length, num_logits] |
|
|
|
|
|
## Performance |
|
|
|
|
|
Various; available in paper |
|
|
|
|
|
## Running Code |
|
|
|
|
|
* Minimal code to instantiate the model and perform inference: |
|
|
```python |
|
|
# Requires flash-linear-attention (https://github.com/fla-org/flash-linear-attention) |
|
|
import fla |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
model = AutoModelForCausalLM.from_pretrained(path_to_model).cuda() |
|
|
tokenizer = AutoTokenizer.from_pretrained(path_to_model).cuda() |
|
|
input_ids = tokenizer("All human beings are", return_tensors="pt").input_ids |
|
|
outputs = model.generate(input_ids, max_length=15) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Gated DeltaNet is released under [MIT License](LICENSE.txt) |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find our work useful, please cite the following publication: |
|
|
|
|
|
```bibtex |
|
|
@misc{he_alleviating_2025, |
|
|
title = {Alleviating {Forgetfulness} of {Linear} {Attention} by {Hybrid} {Sparse} {Attention} and {Contextualized} {Learnable} {Token} {Eviction}}, |
|
|
url = {http://arxiv.org/abs/2510.20787}, |
|
|
doi = {10.48550/arXiv.2510.20787}, |
|
|
publisher = {arXiv}, |
|
|
author = {He, Mutian and Garner, Philip N.}, |
|
|
month = oct, |
|
|
year = {2025}, |
|
|
note = {arXiv:2510.20787 [cs]}, |
|
|
} |
|
|
``` |
|
|
|