Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
ŽMOLÍKOVÁ, K. DELCROIX, M. KINOSHITA, K. HIGUCHI, T. OGAWA, A. NAKATANI, T.
Originální název
Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction
Typ
článek ve sborníku ve WoS nebo Scopus
Jazyk
angličtina
Originální abstrakt
Recently, schemes employing deep neural networks (DNNs) for extracting speech from noisy observation have demonstrated great potential for noise robust automatic speech recognition. However, these schemes are not well suited when the interfering noise is another speaker. To enable extracting a target speaker from a mixture of speakers, we have recently proposed to inform the neural network using speaker information extracted from an adaptation utterance from the same speaker. In our previous work, we explored ways how to inform the network about the speaker and found a speaker adaptive layer approach to be suitable for this task. In our experiments, we used speaker features designed for speaker recognition tasks as the additional speaker information, which may not be optimal for the speaker extraction task. In this paper, we propose a usage of a sequence summarizing scheme enabling to learn the speaker representation jointly with the network. Furthermore, we extend the previous experiments to demonstrate the potential of our proposed method as a front-end for speech recognition and explore the effect of additional noise on the performance of the method.
Klíčová slova
speaker extraction, speaker adaptive neural network, multi-speaker speech recognition, speaker representation learning, beamforming
Autoři
ŽMOLÍKOVÁ, K.; DELCROIX, M.; KINOSHITA, K.; HIGUCHI, T.; OGAWA, A.; NAKATANI, T.
Vydáno
16. 12. 2017
Nakladatel
IEEE Signal Processing Society
Místo
Okinawa
ISBN
978-1-5090-4788-8
Kniha
Proceedings of ASRU 2017
Strany od
8
Strany do
15
Strany počet
URL
http://www.fit.vutbr.cz/research/groups/speech/publi/2017/zmolikova_asru2017.pdf
BibTex
@inproceedings{BUT144503, author="Kateřina {Žmolíková} and Marc {Delcroix} and Keisuke {Kinoshita} and Takuya {Higuchi} and Atsunori {Ogawa} and Tomohiro {Nakatani}", title="Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction", booktitle="Proceedings of ASRU 2017", year="2017", pages="8--15", publisher="IEEE Signal Processing Society", address="Okinawa", doi="10.1109/ASRU.2017.8268910", isbn="978-1-5090-4788-8", url="http://www.fit.vutbr.cz/research/groups/speech/publi/2017/zmolikova_asru2017.pdf" }