ScentLLaMA
A tiny LLaMA-based language model with 600k parameters, pretrained specifically on the synthetic ScentSet dataset (572k entries, ~15M tokens).
Designed exclusively to describe and classify smells and aromas.
Model Details
- Parameters: ~600,000
- Task: Text generation of smell descriptions
- Training data: ScentSet (synthetic dataset of smell descriptions)
- Training date: July 2025
- License: CC BY 4.0
π Training & Evaluation Loss
The following plot shows the training and evaluation loss over time.
Training was performed for approximately 160,000 steps.
The evaluation loss remains consistently close to the training loss throughout training (within ~0.01),
indicating that the model generalizes well and shows no signs of overfitting.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "sixf0ur/ScentLLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "A fresh and fruity aroma with hints of"
inputs = tokenizer(prompt, return_token_type_ids=False, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=25)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# > A fresh and fruity aroma with hints of green leaves and a hint of something earthy. It is a ripe plum.
Citation
@misc{ScentLLaMA_2025,
author = {David S.},
title = {ScentLLaMA: A tiny LLaMA Model for Smell Description Generation},
year = {2025},
publisher = {Hugging Face Models},
howpublished = {\url{https://huggingface.co/sixf0ur/ScentLLaMA}},
note = {Pretrained on the ScentSet dataset to generate natural language descriptions of smells}
}
- Downloads last month
- 38
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support