ScentLLaMA

A tiny LLaMA-based language model with 600k parameters, pretrained specifically on the synthetic ScentSet dataset (572k entries, ~15M tokens).
Designed exclusively to describe and classify smells and aromas.

Model Details

Parameters: ~600,000
Task: Text generation of smell descriptions
Training data: ScentSet (synthetic dataset of smell descriptions)
Training date: July 2025
License: CC BY 4.0

📉 Training & Evaluation Loss

The following plot shows the training and evaluation loss over time.
Training was performed for approximately 160,000 steps.

The evaluation loss remains consistently close to the training loss throughout training (within ~0.01),
indicating that the model generalizes well and shows no signs of overfitting. Training arguments can be seen below:

TRAINING_ARGS = TrainingArguments(
    output_dir=OUTPUT_DIR,
    overwrite_output_dir=True,
    num_train_epochs=20,                     
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=1e-4,
    warmup_steps=500,                        
    lr_scheduler_type="cosine",              
    weight_decay=0.01,
    max_grad_norm=1.0,                       
    logging_dir=os.path.join(OUTPUT_DIR, "logs"),
    logging_steps=100,
    save_steps=500,
    eval_steps=500,
    eval_strategy="steps",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    save_total_limit=2,
    fp16=True,                              
    report_to="tensorboard",
)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sixf0ur/ScentLLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "A fresh and fruity aroma with hints of"
inputs = tokenizer(prompt, return_token_type_ids=False, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=25)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# > A fresh and fruity aroma with hints of green leaves and a hint of something earthy. It is a ripe plum.

Citation

@misc{ScentLLaMA_2025,
  author       = {David S.},
  title        = {ScentLLaMA: A tiny LLaMA Model for Smell Description Generation},
  year         = {2025},
  publisher    = {Hugging Face Models},
  howpublished = {\url{https://huggingface.co/sixf0ur/ScentLLaMA}},
  note         = {Pretrained on the ScentSet dataset to generate natural language descriptions of smells}
}

Downloads last month: 4

Safetensors

Model size

607k params

Tensor type

F32

sixf0ur
/

ScentLLaMA

ScentLLaMA

Model Details

📉 Training & Evaluation Loss

Usage

Citation

Dataset used to train sixf0ur/ScentLLaMA