Publication detail

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?

ZHANG, L. STAFYLAKIS, T. LANDINI, F. DIEZ SÁNCHEZ, M. SILNOVA, A. BURGET, L.

Original Title

Type

article in a collection out of WoS and Scopus

Language

English

Original Abstract

In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what in- formation is essential for the model. EEND-EDA utilizes attrac- tors, vector representations of speakers in a conversation. Our analysis shows that, attractors do not necessarily have to con- tain speaker characteristic information. On the other hand, giv- ing the attractors more freedom to allow them to encode some extra (possibly speaker-specific) information leads to small but consistent diarization performance improvements. Despite ar- chitectural differences in EEND systems, the notion of attrac- tors and frame embeddings is common to most of them and not specific to EEND-EDA. We believe that the main conclu- sions of this work can apply to other variants of EEND. Thus, we hope this paper will be a valuable contribution to guide the community to make more informed decisions when designing new systems.

Keywords

End-to-End Neural Diarization, Speaker Characteristic Information

Authors

ZHANG, L.; STAFYLAKIS, T.; LANDINI, F.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; BURGET, L.

Released

18. 6. 2024

Publisher

International Speech Communication Association

Location

Québec City

Pages from

123

Pages to

130

Pages count

URL

https://www.isca-archive.org/odyssey_2024/zhang24_odyssey.pdf

BibTex

@inproceedings{BUT193432,
  author="ZHANG, L. and STAFYLAKIS, T. and LANDINI, F. and DIEZ SÁNCHEZ, M. and SILNOVA, A. and BURGET, L.",
  title="Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?",
  booktitle="Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
  year="2024",
  pages="123--130",
  publisher="International Speech Communication Association",
  address="Québec City",
  doi="10.21437/odyssey.2024-18",
  url="https://www.isca-archive.org/odyssey_2024/zhang24_odyssey.pdf"
}

Documents

zhang_2024_odyssey.pdf

VUT

Faculties

University Institutes

Parts

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?