NSA-117M-Byte-SFT

This is a fine-tuned version of the NSA-117M byte-level model on the Alpaca dataset.

Model Details

Base model: seconds-0/nsa-117m-byte
Training: 20,000 steps on Alpaca dataset
Tokenizer: Byte-level (256 vocabulary size)
Architecture: Native Sparse Attention (NSA)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True)

prompt = "System: You are a helpful assistant. Answer briefly and clearly.\nUser: What is the capital of France?\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
    
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Loss: 8.46 (final)
Hardware: NVIDIA H100 80GB
Training time: ~1.3 hours
LoRA rank: 16

Note

This model uses byte-level tokenization, so outputs may appear unusual compared to standard subword tokenizers.

Downloads last month: 1

Safetensors

Model size

78.3M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support