Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
KLEMENT, D. DIEZ SÁNCHEZ, M. LANDINI, F. BURGET, L. SILNOVA, A. DELCROIX, M. TAWARA, N.
Originální název
Discriminative Training of VBx Diarization
Typ
článek ve sborníku ve WoS nebo Scopus
Jazyk
angličtina
Originální abstrakt
Bayesian HMM clustering of x-vector sequences (VBx) has be- come a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a gen- eratively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to esti- mate the assignment of x-vectors to speakers. This paper presents a new framework for updating the VBx parameters using discrim- inative training, which directly optimizes a predefined loss. We also propose a new loss that better correlates with the diarization error rate compared to binary cross-entropy - the default choice for diarization end-to-end systems. Proof-of-concept results across three datasets (AMI, CALLHOME, and DIHARD II) demonstrate the method's capability of automatically finding hyperparameters, achieving comparable performance to those found by extensive grid search, which typically requires additional hyperparameter behavior knowledge. Moreover, we show that discriminative fine-tuning of PLDA can further improve the model's performance. We release the source code with this publication.
Klíčová slova
speaker diarization, VBx, clustering, variational Bayes, discriminative training
Autoři
KLEMENT, D.; DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.; SILNOVA, A.; DELCROIX, M.; TAWARA, N.
Vydáno
14. 4. 2024
Nakladatel
IEEE Signal Processing Society
Místo
Seoul
ISBN
979-8-3503-4485-1
Kniha
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Strany od
11871
Strany do
11875
Strany počet
5
URL
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446119
BibTex
@inproceedings{BUT189781, author="KLEMENT, D. and DIEZ SÁNCHEZ, M. and LANDINI, F. and BURGET, L. and SILNOVA, A. and DELCROIX, M. and TAWARA, N.", title="Discriminative Training of VBx Diarization", booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings", year="2024", pages="11871--11875", publisher="IEEE Signal Processing Society", address="Seoul", doi="10.1109/ICASSP48485.2024.10446119", isbn="979-8-3503-4485-1", url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446119" }