EraClassifierBiLSTM-4.76M

This model is a compact bidirectional LSTM neural network designed for musical era classification from MIDI data. It achieves the following results on the evaluation set:

Loss: 1.0935
Accuracy: 0.5852
F1: 0.4299

Model description

The EraClassifierBiLSTM-4.76M is a custom bidirectional LSTM neural network specifically designed for classifying musical compositions into historical eras based on MIDI data analysis. This compact model variant (~4.76M parameters) offers a good balance between performance and computational efficiency. For higher accuracy at the cost of compute, see the larger EraClassifierBiLSTM-134M model.

Architecture

Model Type: Custom Bidirectional LSTM (BiLSTM)
Input: Sequences of 8-dimensional feature vectors extracted from MIDI messages
Window Size: 24 MIDI messages per sequence with stride=20 (overlapping windows)
Hidden Layers: 2 bidirectional LSTM layers with 384 hidden units each
Output: 6-class classification (musical eras)
Activation: LeakyReLU with dropout for regularization
Loss Function: CrossEntropyLoss

Feature Engineering

The model processes 8 key MIDI features per message, automatically selected as the most frequent features across the dataset:

Numerical Features (7):

channel: MIDI channel number (μ=2.01, σ=2.74)
control: Control change values (μ=11.90, σ=17.02)
note: Note pitch/midi note number (μ=64.17, σ=12.00)
tempo: Tempo in microseconds per beat (μ=738221.63, σ=460369.34)
time: Timing information in ticks (μ=714.28, σ=1337451.38)
value: Generic value field (μ=83.91, σ=26.72)
velocity: Note velocity/intensity (μ=42.80, σ=44.24)

Categorical Features (1):

type: MIDI message type (mapped to numerical IDs)

All numerical features are normalized using dataset statistics (mean and standard deviation), while categorical features are encoded using learned ID mappings.

Training Approach

The model uses a sliding window approach to capture temporal patterns in musical structure that are characteristic of different historical periods. Each MIDI file is processed into multiple overlapping sequences, allowing the model to learn both local and global musical patterns.

Intended uses & limitations

Intended Uses

Musicological Research: Analyzing historical trends in musical composition
Educational Tools: Teaching music history through automated era identification
Digital Music Libraries: Automatic categorization and organization of MIDI collections
Music Analysis: Understanding stylistic characteristics across different periods
Content Recommendation: Suggesting music from similar historical periods

Limitations

Performance Variability: The model shows significant performance differences across eras:
- Strong performance on Romantic (82.6%) and Baroque (66.6%) eras
- Moderate performance on Renaissance (45.4%) and Modern (37.0%) eras
- Poor performance on Classical (12.5%) and Other (14.2%) categories
Era Confusion: Adjacent historical periods are frequently confused:
- Renaissance music often misclassified as Baroque (36.7%)
- Classical music heavily confused with Baroque (37.7%) and Romantic (34.1%)
- Modern music often misclassified as Romantic (35.9%)
Data Dependencies: Performance depends on the quality and representativeness of the training data
MIDI-Only: Limited to MIDI format; cannot process audio recordings or sheet music
Cultural Bias: Training data may reflect Western classical music traditions

Below is the confusion matrix for the best-performing checkpoint, visually highlighting these misclassifications (click to enlarge):

Recommendations for Use

Validate results with musicological expertise, especially for Classical period identification
Use confidence thresholds to filter low-confidence predictions

Training and evaluation data

Dataset

Source: TiMauzi/imslp-midi-by-sa (International Music Score Library Project)
Format: MIDI files with associated metadata including composition year and era
Preprocessing: MIDI messages converted to 8-dimensional feature vectors
Window Strategy: 24-message windows with 20-message stride for overlapping sequences

Musical Eras Covered

Renaissance (1400-1600): Early polyphonic music, madrigals, motets
Baroque (1600-1750): Ornamented music, basso continuo, fugues
Classical (1750-1820): Clear forms, balanced phrases, sonata form
Romantic (1820-1900): Expressive, emotional, expanded forms
Modern (1900-present): Atonal, experimental, diverse styles
Other: Miscellaneous or unclear period classifications

The numbers 0 through 5 correspond to each era's index during inference.

Data Distribution

The model was trained on 6,992 MIDI files from the IMSLP dataset with the following era distribution:

Romantic: 2,722 samples (38.9%) - median year 1854
Baroque: 1,874 samples (26.8%) - median year 1710
Renaissance: 843 samples (12.1%) - median year 1611
Modern: 763 samples (10.9%) - median year 2020
Classical: 597 samples (8.5%) - median year 1779
Other: 193 samples (2.8%) - median year 1909 (includes Early 20th century and Medieval)

Era thresholding was applied (minimum 150 samples per era), with rare eras like "Early 20th century" (125 samples) and "Medieval" (5 samples) mapped to the "Other" category to maintain classification stability.

Evaluation Strategy

Validation: Performance measured on held-out validation set
Test Set: Final evaluation on completely unseen test data
Metrics: Accuracy, F1-score (macro-averaged), and confusion matrix analysis
Training Duration: 5 epochs (~96,000 training steps) with fallback to best result (early stopping) based on F1 score

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 4.761974698772928e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: reduce_lr_on_plateau
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
1.2797	0.1031	2000	1.3522	0.4608	0.2486
1.1521	0.2063	4000	1.2422	0.4987	0.3139
1.0887	0.3094	6000	1.2189	0.5056	0.3223
1.0432	0.4126	8000	1.1715	0.5252	0.3479
1.019	0.5157	10000	1.2021	0.5150	0.3304
0.9963	0.6188	12000	1.1789	0.5252	0.3487
0.976	0.7220	14000	1.1151	0.5759	0.3983
0.9544	0.8251	16000	1.1800	0.5299	0.3529
0.9455	0.9283	18000	1.1866	0.5415	0.3662
0.9276	1.0314	20000	1.1744	0.5350	0.3792
0.9167	1.1345	22000	1.1032	0.5774	0.4120
0.9084	1.2377	24000	1.1312	0.5553	0.3818
0.8758	1.3408	26000	1.1042	0.5667	0.4109
0.859	1.4440	28000	1.1065	0.5733	0.4125
0.8607	1.5471	30000	1.1104	0.5695	0.4115
0.8526	1.6503	32000	1.1011	0.5830	0.4255
0.8559	1.7534	34000	1.1083	0.5765	0.4136
0.8501	1.8565	36000	1.1113	0.5752	0.4163
0.8497	1.9597	38000	1.0935	0.5775	0.4220
0.8473	2.0628	40000	1.1092	0.5745	0.4181
0.8441	2.1660	42000	1.1095	0.5733	0.4164
0.8396	2.2691	44000	1.0935	0.5852	0.4299
0.8391	2.3722	46000	1.1054	0.5744	0.4160
0.8401	2.4754	48000	1.1008	0.5755	0.4198
0.8327	2.5785	50000	1.1097	0.5712	0.4132
0.838	2.6817	52000	1.1055	0.5720	0.4143
0.8329	2.7848	54000	1.1055	0.5728	0.4165
0.8346	2.8879	56000	1.1038	0.5743	0.4172
0.8353	2.9911	58000	1.1090	0.5728	0.4167
0.8385	3.0942	60000	1.1013	0.5755	0.4201
0.8337	3.1974	62000	1.1088	0.5733	0.4163
0.8256	3.3005	64000	1.1076	0.5748	0.4177
0.8367	3.4036	66000	1.1066	0.5730	0.4159
0.831	3.5068	68000	1.1083	0.5732	0.4164
0.8283	3.6099	70000	1.1067	0.5744	0.4173
0.8349	3.7131	72000	1.1058	0.5747	0.4180
0.8313	3.8162	74000	1.1058	0.5741	0.4171
0.8313	3.9193	76000	1.1065	0.5735	0.4169
0.8309	4.0225	78000	1.1067	0.5736	0.4171
0.8331	4.1256	80000	1.1055	0.5744	0.4174
0.8371	4.2288	82000	1.1058	0.5735	0.4167
0.8344	4.3319	84000	1.1060	0.5734	0.4166
0.8291	4.4350	86000	1.1049	0.5747	0.4185
0.8343	4.5382	88000	1.1053	0.5735	0.4171
0.8293	4.6413	90000	1.1056	0.5736	0.4174
0.8294	4.7445	92000	1.1056	0.5736	0.4174
0.8316	4.8476	94000	1.1055	0.5736	0.4174
0.8264	4.9508	96000	1.1056	0.5736	0.4174

Training Analysis

Below is the full training metrics plot, showing loss, accuracy, and F1-score trends over the entire training process (click to enlarge):

The training shows stable convergence with the model reaching its best performance around step 44,000 (epoch 2.27). The training loss decreases steadily while validation metrics stabilize, indicating good generalization without severe overfitting. The model achieves its peak F1 score of 0.4299 at step 44,000, which was selected as the best checkpoint.

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu126
Datasets 3.3.2
Tokenizers 0.21.0

Downloads last month: 6

Safetensors

Model size

4.76M params

Tensor type

F32

Dataset used to train TiMauzi/EraClassifierBiLSTM-4.76M

Collection including TiMauzi/EraClassifierBiLSTM-4.76M

MIDI Era Classifiers

Collection

Models which predict musical eras based on MIDI files. • 2 items • Updated Aug 10

Evaluation results

Metadata error: specify a dataset to view leaderboard