--- tags: - pyannote - audio - voice - speech - speaker - speaker segmentation - voice activity detection - overlapped speech detection - resegmentation datasets: - ami - dihard - voxconverse license: mit inference: false --- # Pretrained speaker segmentation model This model relies on `pyannote.audio` 2.0 (which is still in development): ```bash $ pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip ``` ## Basic inference ```python >>> from pyannote.audio import Inference >>> inference = Inference("pyannote/Segmentation") >>> segmentation = inference("audio.wav") ``` ## Advanced pipelines ### Voice activity detection ```python >>> from pyannote.audio.pipelines import VoiceActivityDetection >>> HYPER_PARAMETERS = {"onset": 0.5, "offset": 0.5, "min_duration_on": 0.0, "min_duration_off": 0.0} >>> pipeline = VoiceActivityDetection(segmentation="pyannote/Segmentation").instantiate(HYPER_PARAMETERS) >>> vad = pipeline("audio.wav") ``` Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off` ----------------|---------|----------|-------------------|------------------- AMI Mix-Headset | TODO | TODO | TODO | TODO DIHARD3 | TODO | TODO | TODO | TODO VoxConverse | TODO | TODO | TODO | TODO ### Overlapped speech detection ```python >>> from pyannote.audio.pipelines import OverlappedSpeechDetection >>> pipeline = OverlappedSpeechDetection(segmentation="pyannote/Segmentation").instantiate(HYPER_PARAMETERS) >>> osd = pipeline("audio.wav") ``` Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off` ----------------|---------|----------|-------------------|------------------- AMI Mix-Headset | TODO | TODO | TODO | TODO DIHARD3 | TODO | TODO | TODO | TODO VoxConverse | TODO | TODO | TODO | TODO ### Segmentation ```python >>> from pyannote.audio.pipelines import Segmentation >>> pipeline = Segmentation(segmentation="pyannote/Segmentation").instantiate(HYPER_PARAMETERS) >>> seg = pipeline("audio.wav") ``` Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off` ----------------|---------|----------|-------------------|------------------- AMI Mix-Headset | TODO | TODO | TODO | TODO DIHARD3 | TODO | TODO | TODO | TODO VoxConverse | TODO | TODO | TODO | TODO ### Resegmentation ```python >>> from pyannote.audio.pipelines import Resegmentation >>> pipeline = Resegmentation(segmentation="pyannote/Segmentation", diarization="baseline") >>> assert isinstance(baseline, pyannote.core.Annotation) >>> resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline}) ``` Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off` ----------------|---------|----------|-------------------|------------------- AMI Mix-Headset | TODO | TODO | TODO | TODO DIHARD3 | TODO | TODO | TODO | TODO VoxConverse | TODO | TODO | TODO | TODO ## Citations ```bibtex @inproceedings{Bredin2020, Title = {{pyannote.audio: neural building blocks for speaker diarization}}, Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe}, Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing}, Address = {Barcelona, Spain}, Month = {May}, Year = {2020}, } ```