Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
DELCROIX, M. TAWARA, N. DIEZ SÁNCHEZ, M. LANDINI, F. SILNOVA, A. OGAWA, A. NAKATANI, T. BURGET, L. ARAKI, S.
Originální název
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Typ
článek ve sborníku ve WoS nebo Scopus
Jazyk
angličtina
Originální abstrakt
Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.
Klíčová slova
speaker diarization, end-to-end, VBx, clustering
Autoři
DELCROIX, M.; TAWARA, N.; DIEZ SÁNCHEZ, M.; LANDINI, F.; SILNOVA, A.; OGAWA, A.; NAKATANI, T.; BURGET, L.; ARAKI, S.
Vydáno
20. 8. 2023
Nakladatel
International Speech Communication Association
Místo
Dublin
ISSN
1990-9772
Periodikum
Proceedings of Interspeech
Ročník
2023
Číslo
08
Stát
Francouzská republika
Strany od
3477
Strany do
3481
Strany počet
5
URL
https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf
BibTex
@inproceedings{BUT185573, author="DELCROIX, M. and TAWARA, N. and DIEZ SÁNCHEZ, M. and LANDINI, F. and SILNOVA, A. and OGAWA, A. and NAKATANI, T. and BURGET, L. and ARAKI, S.", title="Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization", booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH", year="2023", journal="Proceedings of Interspeech", volume="2023", number="08", pages="3477--3481", publisher="International Speech Communication Association", address="Dublin", doi="10.21437/Interspeech.2023-628", issn="1990-9772", url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf" }
Dokumenty
delcroix23_interspeech2023_multi-stream.pdf