Detail publikace

Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion

ŽMOLÍKOVÁ, K. DELCROIX, M. KINOSHITA, K. HIGUCHI, T. NAKATANI, T. ČERNOCKÝ, J.

Originální název

Typ

článek ve sborníku ve WoS nebo Scopus

Jazyk

angličtina

Originální abstrakt

This paper addresses the problem of recognizing speech corrupted by overlapping speakers in a multichannel setting. To extract a target speaker from the mixture, we use a neural network based beamformer which uses masks estimated by a neural network to compute statistically optimal spatial filters. Following our previous work, we inform the neural network about the target speaker using information extracted from an adaptation utterance, enabling the network to track the target speaker. While in the previous work, this method was used to separately extract the speaker and then pass such preprocessed speech to a speech recognition system, here we explore training both systems jointly with a common speech recognition criterion. We show that integrating the two systems and training for the final objective improves the performance. In addition, the integration enables further sharing of information between the acoustic model and the speaker extraction system, by making use of the predicted HMMstate posteriors to refine the masks used for beamforming.

Klíčová slova

Speaker extraction, joint training, speaker adaptive neural network, beamforming, speech recognition

Autoři

ŽMOLÍKOVÁ, K.; DELCROIX, M.; KINOSHITA, K.; HIGUCHI, T.; NAKATANI, T.; ČERNOCKÝ, J.

Vydáno

15. 4. 2018

Nakladatel

IEEE Signal Processing Society

Místo

Calgary

ISBN

978-1-5386-4658-8

Kniha

Proceedings of ICASSP 2018

Strany od

6702

Strany do

6706

Strany počet

URL

https://www.fit.vut.cz/research/publication/11722/

BibTex

@inproceedings{BUT155044,
  author="Kateřina {Žmolíková} and Marc {Delcroix} and Keisuke {Kinoshita} and Takuya {Higuchi} and Tomohiro {Nakatani} and Jan {Černocký}",
  title="Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion",
  booktitle="Proceedings of ICASSP 2018",
  year="2018",
  pages="6702--6706",
  publisher="IEEE Signal Processing Society",
  address="Calgary",
  doi="10.1109/ICASSP.2018.8461533",
  isbn="978-1-5386-4658-8",
  url="https://www.fit.vut.cz/research/publication/11722/"
}

VUT

Fakulty

Vysokoškolské ústavy

Součásti

Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion