NSA-117M-Byte-SFT

This is a fine-tuned version of the NSA-117M byte-level model on the Alpaca dataset.

Model Details

  • Base model: seconds-0/nsa-117m-byte
  • Training: 20,000 steps on Alpaca dataset
  • Tokenizer: Byte-level (256 vocabulary size)
  • Architecture: Native Sparse Attention (NSA)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True)

prompt = "System: You are a helpful assistant. Answer briefly and clearly.\nUser: What is the capital of France?\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
    
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

  • Loss: 8.46 (final)
  • Hardware: NVIDIA H100 80GB
  • Training time: ~1.3 hours
  • LoRA rank: 16

Note

This model uses byte-level tokenization, so outputs may appear unusual compared to standard subword tokenizers.

Downloads last month
1
Safetensors
Model size
78.3M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support