Publication detail
Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
ZHANG, L. STAFYLAKIS, T. LANDINI, F. DIEZ SÁNCHEZ, M. SILNOVA, A. BURGET, L.
Original Title
Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
Type
article in a collection out of WoS and Scopus
Language
English
Original Abstract
In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what in- formation is essential for the model. EEND-EDA utilizes attrac- tors, vector representations of speakers in a conversation. Our analysis shows that, attractors do not necessarily have to con- tain speaker characteristic information. On the other hand, giv- ing the attractors more freedom to allow them to encode some extra (possibly speaker-specific) information leads to small but consistent diarization performance improvements. Despite ar- chitectural differences in EEND systems, the notion of attrac- tors and frame embeddings is common to most of them and not specific to EEND-EDA. We believe that the main conclu- sions of this work can apply to other variants of EEND. Thus, we hope this paper will be a valuable contribution to guide the community to make more informed decisions when designing new systems.
Keywords
End-to-End Neural Diarization, Speaker Characteristic Information
Authors
ZHANG, L.; STAFYLAKIS, T.; LANDINI, F.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; BURGET, L.
Released
18. 6. 2024
Publisher
International Speech Communication Association
Location
Québec City
Pages from
123
Pages to
130
Pages count
8
URL
BibTex
@inproceedings{BUT193432,
author="ZHANG, L. and STAFYLAKIS, T. and LANDINI, F. and DIEZ SÁNCHEZ, M. and SILNOVA, A. and BURGET, L.",
title="Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?",
booktitle="Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
year="2024",
pages="123--130",
publisher="International Speech Communication Association",
address="Québec City",
doi="10.21437/odyssey.2024-18",
url="https://www.isca-archive.org/odyssey_2024/zhang24_odyssey.pdf"
}
Documents