NSA-117M-Byte-SFT
This is a fine-tuned version of the NSA-117M byte-level model on the Alpaca dataset.
Model Details
- Base model: seconds-0/nsa-117m-byte
- Training: 20,000 steps on Alpaca dataset
- Tokenizer: Byte-level (256 vocabulary size)
- Architecture: Native Sparse Attention (NSA)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True)
prompt = "System: You are a helpful assistant. Answer briefly and clearly.\nUser: What is the capital of France?\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Details
- Loss: 8.46 (final)
- Hardware: NVIDIA H100 80GB
- Training time: ~1.3 hours
- LoRA rank: 16
Note
This model uses byte-level tokenization, so outputs may appear unusual compared to standard subword tokenizers.
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support