File size: 2,263 Bytes
71f65d8
 
 
cc1a5e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
language:
- en
---

End-to-end Neural Diarization with Encoder-Decoder Based Attractors trained on AMI-headset. 
This example could be found at `egs2/ami/diar1`. 

## Configurations:

- Use ESPNet's default frontend to extract features. The sampling rate is 8000 Hz, with a frame length of 25 ms and a frame shift of 10 ms. The frontend extracts 23 log-scaled Mel-filterbanks.
- Use 4 layer stacked Transformer encoder, each outputs 256-dimensional frame-wise embeddings.
- Use the ESPNet' standard rnn attractor (LSTM) with hidden size of 256.
- Initial training uses data with 4 speakers for 500 epochs, following `spk4/diar_train_diar_eda_raw_spk4/config.yaml`.
- Adaptation involves fine-tuning the model using data with 3 and 5 speakers respectively for 20 epochs respectively, using `spk3/diar_train_diar_eda_raw_spk3/config.yaml` and `spk5/diar_train_diar_eda_raw_spk5/config.yaml` respectively.

## RESULTS

The following results were obtained using the checkpoint `spk5/diar_train_diar_eda_raw_spk5/20epoch.pth`, tested on the test and development sets with the 4-speakers.

### Environments
- date: `Thu Dec 19 22:43:37 EST 2024`
- python version: `3.11.10 (main, Oct  3 2024, 07:29:13) [GCC 11.2.0]`
- espnet version: `espnet 202409`
- pytorch version: `pytorch 2.4.0`
- Git hash: `c12b3d59ca4fd8847edf274e56a1716474d2a30e`
  - Commit date: `Thu Dec 19 21:58:26 2024 -0500`

### spk4
#### DER
diarized_test
|threshold_median_collar|DER|
|---|---|
|result_th0.3_med11_collar0.0|72.44|
|result_th0.3_med1_collar0.0|74.64|
|result_th0.4_med11_collar0.0|70.60|
|result_th0.4_med1_collar0.0|72.30|
|result_th0.5_med11_collar0.0|70.45|
|result_th0.5_med1_collar0.0|72.02|
|result_th0.6_med11_collar0.0|71.85|
|result_th0.6_med1_collar0.0|73.41|
|result_th0.7_med11_collar0.0|75.56|
|result_th0.7_med1_collar0.0|77.02|
### spk4
#### DER
diarized_dev
|threshold_median_collar|DER|
|---|---|
|result_th0.3_med11_collar0.0|74.37|
|result_th0.3_med1_collar0.0|75.96|
|result_th0.4_med11_collar0.0|71.69|
|result_th0.4_med1_collar0.0|72.94|
|result_th0.5_med11_collar0.0|70.83|
|result_th0.5_med1_collar0.0|72.12|
|result_th0.6_med11_collar0.0|71.96|
|result_th0.6_med1_collar0.0|73.34|
|result_th0.7_med11_collar0.0|75.81|
|result_th0.7_med1_collar0.0|76.99|