Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
DELCROIX, M. ŽMOLÍKOVÁ, K. OCHIAI, T. KINOSHITA, K. NAKATANI, T.
Originální název
Speaker activity driven neural speech extraction
Typ
článek ve sborníku ve WoS nebo Scopus
Jazyk
angličtina
Originální abstrakt
Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest. Various clues have been investigated such as pre-recorded enrollment utterances, direction information, or video of the target speaker. In this paper, we explore the use of speaker activity information as an auxiliary clue for single-channel neural network-based speech extraction. We propose a speaker activity driven speech extraction neural network (ADEnet) and show that it can achieve performance levels competitive with enrollmentbased approaches, without the need for pre-recordings. We further demonstrate the potential of the proposed approach for processing meeting-like recordings, where speaker activity obtained from a diarization system is used as a speaker clue for ADEnet. We show that this simple yet practical approach can successfully extract speakers after diarization, which leads to improved ASR performance when using a single microphone, especially in high overlapping conditions, with relative word error rate reduction of up to 25 %.
Klíčová slova
Speech extraction, Speaker activity, Speech enhancement, Meeting recognition, Neural network
Autoři
DELCROIX, M.; ŽMOLÍKOVÁ, K.; OCHIAI, T.; KINOSHITA, K.; NAKATANI, T.
Vydáno
6. 6. 2021
Nakladatel
IEEE Signal Processing Society
Místo
Toronto
ISBN
978-1-7281-7605-5
Kniha
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Strany od
6099
Strany do
6103
Strany počet
5
URL
https://www.fit.vut.cz/research/publication/12479/
BibTex
@inproceedings{BUT171749, author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and OCHIAI, T. and KINOSHITA, K. and NAKATANI, T.", title="Speaker activity driven neural speech extraction", booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings", year="2021", pages="6099--6103", publisher="IEEE Signal Processing Society", address="Toronto", doi="10.1109/ICASSP39728.2021.9414998", isbn="978-1-7281-7605-5", url="https://www.fit.vut.cz/research/publication/12479/" }
Dokumenty
delcroix_icassp2021_09414998.pdf