Detail publikace

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

DELCROIX, M. TAWARA, N. DIEZ SÁNCHEZ, M. LANDINI, F. SILNOVA, A. OGAWA, A. NAKATANI, T. BURGET, L. ARAKI, S.

Originální název

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

Typ

článek ve sborníku ve WoS nebo Scopus

Jazyk

angličtina

Originální abstrakt

Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.

Klíčová slova

speaker diarization, end-to-end, VBx, clustering

Autoři

DELCROIX, M.; TAWARA, N.; DIEZ SÁNCHEZ, M.; LANDINI, F.; SILNOVA, A.; OGAWA, A.; NAKATANI, T.; BURGET, L.; ARAKI, S.

Vydáno

20. 8. 2023

Nakladatel

International Speech Communication Association

Místo

Dublin

ISSN

1990-9772

Periodikum

Proceedings of Interspeech

Ročník

2023

Číslo

08

Stát

Francouzská republika

Strany od

3477

Strany do

3481

Strany počet

5

URL

BibTex

@inproceedings{BUT185573,
  author="DELCROIX, M. and TAWARA, N. and DIEZ SÁNCHEZ, M. and LANDINI, F. and SILNOVA, A. and OGAWA, A. and NAKATANI, T. and BURGET, L. and ARAKI, S.",
  title="Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="3477--3481",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-628",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf"
}