Publication detail
Multi-Channel Extension of Pre-trained Models for Speaker Verification
MOŠNER, L. SERIZEL, R. BURGET, L. PLCHOT, O. VINCENT, E. PENG, J. ČERNOCKÝ, J.
Original Title
Multi-Channel Extension of Pre-trained Models for Speaker Verification
Type
conference paper
Language
English
Original Abstract
In this work, we focus on designing a multi-channel speech processing system based on large pre-trained models. These models are typically trained for single-channel scenarios via self-supervised learning (SSL). A common approach to using the SSL models with microphone array data is to prepend it with a multi-channel speech enhancement. The downside is that spatial information can be leveraged only by the pre-processing stage, and enhancement errors get propagated to the SSL model. We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per- channel processing with cross-channel information exchange, eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.
Keywords
multi-channel speaker verification, pre-trained models
Authors
MOŠNER, L.; SERIZEL, R.; BURGET, L.; PLCHOT, O.; VINCENT, E.; PENG, J.; ČERNOCKÝ, J.
Released
1. 9. 2024
Publisher
International Speech Communication Association
Location
Kos
ISBN
1990-9772
Periodical
Proceedings of Interspeech
Year of study
2024
Number
9
State
French Republic
Pages from
2135
Pages to
2139
Pages count
5
URL
BibTex
@inproceedings{BUT193682,
author="MOŠNER, L. and SERIZEL, R. and BURGET, L. and PLCHOT, O. and VINCENT, E. and PENG, J. and ČERNOCKÝ, J.",
title="Multi-Channel Extension of Pre-trained Models for Speaker Verification",
booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
year="2024",
journal="Proceedings of Interspeech",
volume="2024",
number="9",
pages="2135--2139",
publisher="International Speech Communication Association",
address="Kos",
doi="10.21437/Interspeech.2024-1260",
issn="1990-9772",
url="https://www.isca-archive.org/interspeech_2024/mosner24_interspeech.pdf"
}
Documents