End-to-end Neural Diarization with Encoder-Decoder Based Attractors trained on AMI-headset.
This example could be found at egs2/ami/diar1.
Configurations:
- Use ESPNet's default frontend to extract features. The sampling rate is 8000 Hz, with a frame length of 25 ms and a frame shift of 10 ms. The frontend extracts 23 log-scaled Mel-filterbanks.
- Use 4 layer stacked Transformer encoder, each outputs 256-dimensional frame-wise embeddings.
- Use the ESPNet' standard rnn attractor (LSTM) with hidden size of 256.
- Initial training uses data with 4 speakers for 500 epochs, following
spk4/diar_train_diar_eda_raw_spk4/config.yaml. - Adaptation involves fine-tuning the model using data with 3 and 5 speakers respectively for 20 epochs respectively, using
spk3/diar_train_diar_eda_raw_spk3/config.yamlandspk5/diar_train_diar_eda_raw_spk5/config.yamlrespectively.
RESULTS
The following results were obtained using the checkpoint spk5/diar_train_diar_eda_raw_spk5/20epoch.pth, tested on the test and development sets with the 4-speakers.
Environments
- date:
Thu Dec 19 22:43:37 EST 2024 - python version:
3.11.10 (main, Oct 3 2024, 07:29:13) [GCC 11.2.0] - espnet version:
espnet 202409 - pytorch version:
pytorch 2.4.0 - Git hash:
c12b3d59ca4fd8847edf274e56a1716474d2a30e- Commit date:
Thu Dec 19 21:58:26 2024 -0500
- Commit date:
spk4
DER
diarized_test
| threshold_median_collar | DER |
|---|---|
| result_th0.3_med11_collar0.0 | 72.44 |
| result_th0.3_med1_collar0.0 | 74.64 |
| result_th0.4_med11_collar0.0 | 70.60 |
| result_th0.4_med1_collar0.0 | 72.30 |
| result_th0.5_med11_collar0.0 | 70.45 |
| result_th0.5_med1_collar0.0 | 72.02 |
| result_th0.6_med11_collar0.0 | 71.85 |
| result_th0.6_med1_collar0.0 | 73.41 |
| result_th0.7_med11_collar0.0 | 75.56 |
| result_th0.7_med1_collar0.0 | 77.02 |
spk4
DER
diarized_dev
| threshold_median_collar | DER |
|---|---|
| result_th0.3_med11_collar0.0 | 74.37 |
| result_th0.3_med1_collar0.0 | 75.96 |
| result_th0.4_med11_collar0.0 | 71.69 |
| result_th0.4_med1_collar0.0 | 72.94 |
| result_th0.5_med11_collar0.0 | 70.83 |
| result_th0.5_med1_collar0.0 | 72.12 |
| result_th0.6_med11_collar0.0 | 71.96 |
| result_th0.6_med1_collar0.0 | 73.34 |
| result_th0.7_med11_collar0.0 | 75.81 |
| result_th0.7_med1_collar0.0 | 76.99 |
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support