TakalaWang's picture
End of training
c272f01 verified
|
raw
history blame
2.9 kB
metadata
library_name: transformers
license: mit
base_model: microsoft/Phi-4-multimodal-instruct
tags:
  - generated_from_trainer
model-index:
  - name: Discussion-Phi-4-multimodal-instruct-audio-dimp-tag
    results: []

Discussion-Phi-4-multimodal-instruct-audio-dimp-tag

This model is a fine-tuned version of microsoft/Phi-4-multimodal-instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 47.2708

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.95) and epsilon=1e-07 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
14.0644 0.1117 10 46.4600
0.806 0.2235 20 47.0485
0.8452 0.3352 30 48.5161
1.1108 0.4469 40 48.2024
0.5848 0.5587 50 48.0342
1.5352 0.6704 60 47.5009
0.8749 0.7821 70 46.9722
0.6388 0.8939 80 47.6718
0.1562 1.0 90 47.8837
0.5959 1.1117 100 47.5622
1.1912 1.2235 110 47.6055
0.4967 1.3352 120 47.7299
0.204 1.4469 130 48.1439
0.5732 1.5587 140 48.4459
0.2537 1.6704 150 47.5499
0.1044 1.7821 160 47.4731
0.177 1.8939 170 47.1623
0.131 2.0 180 47.0603
0.144 2.1117 190 47.2259
0.0884 2.2235 200 46.9473
0.5288 2.3352 210 47.2556
0.483 2.4469 220 47.6419
0.0972 2.5587 230 47.6346
0.0148 2.6704 240 47.5911
0.0224 2.7821 250 47.2943
0.2197 2.8939 260 47.2708

Framework versions

  • Transformers 4.48.2
  • Pytorch 2.4.1+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.1