an-imdb-classifier

This model is a fine-tuned version of distilbert-base-uncased on the stanfordnlp.imdb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3635
  • Accuracy: 0.898

Model description

This model is a fine-tuned version of the distilbert-base-uncased model, trained for sentiment analysis on a subset of the IMDb dataset. It is designed to classify movie reviews as either positive or negative.

Intended uses & limitations

This model is intended for use in classifying the sentiment of movie reviews.

It can be used for tasks such as: Automatically categorizing movie reviews on websites or platforms. Analyzing the overall sentiment towards a particular movie. Providing feedback to users based on their review sentiment.

Training and evaluation data

The model was fine-tuned on a small subset of the IMDb dataset.

Training set size: 5000 examples Evaluation set size: 500 examples

The dataset contains movie reviews labeled as either positive (label 1) or negative (label 0). The distribution of labels in the training set is approximately equal (2494 negative, 2506 positive).

Training procedure

The model was trained using the Hugging Face Trainer on the tokenized IMDb dataset subset, using the preprocess_function to tokenize the text and truncate it.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 313 0.3199 0.866
0.2966 2.0 626 0.3023 0.89
0.2966 3.0 939 0.3635 0.898

Framework versions

  • Transformers 4.55.0
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
8
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nppiech/an-imdb-classifier

Finetuned
(10112)
this model

Dataset used to train nppiech/an-imdb-classifier