Přístupnostní navigace
E-application
Search Search Close
Publication detail
DELCROIX, M. TAWARA, N. DIEZ SÁNCHEZ, M. LANDINI, F. SILNOVA, A. OGAWA, A. NAKATANI, T. BURGET, L. ARAKI, S.
Original Title
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Type
conference paper
Language
English
Original Abstract
Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.
Keywords
speaker diarization, end-to-end, VBx, clustering
Authors
DELCROIX, M.; TAWARA, N.; DIEZ SÁNCHEZ, M.; LANDINI, F.; SILNOVA, A.; OGAWA, A.; NAKATANI, T.; BURGET, L.; ARAKI, S.
Released
20. 8. 2023
Publisher
International Speech Communication Association
Location
Dublin
ISBN
1990-9772
Periodical
Proceedings of Interspeech
Year of study
2023
Number
08
State
French Republic
Pages from
3477
Pages to
3481
Pages count
5
URL
https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf
BibTex
@inproceedings{BUT185573, author="DELCROIX, M. and TAWARA, N. and DIEZ SÁNCHEZ, M. and LANDINI, F. and SILNOVA, A. and OGAWA, A. and NAKATANI, T. and BURGET, L. and ARAKI, S.", title="Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization", booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH", year="2023", journal="Proceedings of Interspeech", volume="2023", number="08", pages="3477--3481", publisher="International Speech Communication Association", address="Dublin", doi="10.21437/Interspeech.2023-628", issn="1990-9772", url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf" }