GLA 1.3B-100B

This repository contains the gla-1.3B-100B model, a 1.3B parameter variant trained on 100B tokens, which was presented in the paper Gated Linear Attention Transformers with Hardware-Efficient Training.

Usage

This model can be easily loaded and used for text generation tasks with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the tokenizer and model
model_id = "fla-hub/gla-1.3B-100B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

# Example for text generation
prompt = "Hello, my name is"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, top_k=50, top_p=0.95, temperature=0.7)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)

Paper and Citation

If you find this work useful, please consider citing the original paper:

Gated Linear Attention Transformers with Hardware-Efficient Training

@article{li2025systematic,
  title={Gated Linear Attention Transformers with Hardware-Efficient Training},
  author={Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, Yoon Kim},
  journal={arXiv preprint arXiv:2312.06635},
  year={2023},
}

Code

The official codebase for the models and research, including training scripts and other checkpoints, can be found on GitHub:

https://github.com/fla-org/flash-linear-attention

Downloads last month: 215

Safetensors

Model size

1B params

Tensor type

BF16

Dataset used to train fla-hub/gla-1.3B-100B

Collection including fla-hub/gla-1.3B-100B

GLA

Collection

4 items • Updated Mar 18