Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
CHO, J. BASKAR, M. LI, R. WIESNER, M. MALLIDI, S. YALTA, N. KARAFIÁT, M. WATANABE, S. HORI, T.
Originální název
Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling
Typ
článek ve sborníku ve WoS nebo Scopus
Jazyk
angličtina
Originální abstrakt
Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multilingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data.
Klíčová slova
Automatic speech recognition (ASR), sequence to sequence, multilingual setup, transfer learning, language modeling
Autoři
CHO, J.; BASKAR, M.; LI, R.; WIESNER, M.; MALLIDI, S.; YALTA, N.; KARAFIÁT, M.; WATANABE, S.; HORI, T.
Vydáno
18. 12. 2018
Nakladatel
IEEE Signal Processing Society
Místo
Athens
ISBN
978-1-5386-4334-1
Kniha
Proceedings of 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018)
Strany od
521
Strany do
527
Strany počet
7
URL
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8639655
BibTex
@inproceedings{BUT163489, author="CHO, J. and BASKAR, M. and LI, R. and WIESNER, M. and MALLIDI, S. and YALTA, N. and KARAFIÁT, M. and WATANABE, S. and HORI, T.", title="Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling", booktitle="Proceedings of 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018)", year="2018", pages="521--527", publisher="IEEE Signal Processing Society", address="Athens", doi="10.1109/SLT.2018.8639655", isbn="978-1-5386-4334-1", url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8639655" }