GLA
Collection
4 items
•
Updated
This repository contains the gla-1.3B-100B
model, a 1.3B parameter variant trained on 100B tokens, which was presented in the paper Gated Linear Attention Transformers with Hardware-Efficient Training.
This model can be easily loaded and used for text generation tasks with the Hugging Face transformers
library:
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load the tokenizer and model
model_id = "fla-hub/gla-1.3B-100B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
# Example for text generation
prompt = "Hello, my name is"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate text
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, top_k=50, top_p=0.95, temperature=0.7)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
If you find this work useful, please consider citing the original paper:
Gated Linear Attention Transformers with Hardware-Efficient Training
@article{li2025systematic,
title={Gated Linear Attention Transformers with Hardware-Efficient Training},
author={Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, Yoon Kim},
journal={arXiv preprint arXiv:2312.06635},
year={2023},
}
The official codebase for the models and research, including training scripts and other checkpoints, can be found on GitHub: