Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
KOCOUR, M. UMESH, J. KARAFIÁT, M. ŠVEC, J. LOPEZ, F. BENEŠ, K. DIEZ SÁNCHEZ, M. SZŐKE, I. LUQUE, J. VESELÝ, K. BURGET, L. ČERNOCKÝ, J.
Originální název
BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge
Typ
článek ve sborníku mimo WoS a Scopus
Jazyk
angličtina
Originální abstrakt
Research on the development of Automatic Speech Recognition systems for the Albayzin 2022 Challenge. We train and evaluate both hybrid systems and those based on end-to-end models. We also investigate the use of self-supervised learning speech representations from pre-trained models and their impact on ASR performance (as opposed to training models directly from scratch). Additionally, we also apply the Whisper model in a zero-shot fashion, postprocessing its output to fit the required transcription format. On top of tuning the model architectures and overall training schemes, we improve the robustness of our models by augmenting the training data with noises extracted from the target domain. Moreover, we apply rescoring with an external LM on top of N-best hypotheses to adjust each sentence score and pick the single best hypothesis. All these efforts lead to a significant WER reduction. Our single best system and the fusion of selected systems achieved 16.3% and 13.7% WER respectively on RTVE2020 test partition, i.e. the official evaluation partition from the previous Albayzin challenge.
Klíčová slova
ASR fusion, end-to-end model, self-supervised learning, automatic speech recognition.
Autoři
KOCOUR, M.; UMESH, J.; KARAFIÁT, M.; ŠVEC, J.; LOPEZ, F.; BENEŠ, K.; DIEZ SÁNCHEZ, M.; SZŐKE, I.; LUQUE, J.; VESELÝ, K.; BURGET, L.; ČERNOCKÝ, J.
Vydáno
14. 11. 2022
Nakladatel
International Speech Communication Association
Místo
Granada
Strany od
276
Strany do
280
Strany počet
5
URL
https://www.isca-speech.org/archive/pdfs/iberspeech_2022/kocour22_iberspeech.pdf
BibTex
@inproceedings{BUT180167, author="Martin {Kocour} and Jahnavi {Umesh} and Martin {Karafiát} and Ján {Švec} and Fernando {Lopez} and Karel {Beneš} and Mireia {Diez Sánchez} and Igor {Szőke} and Jordi {Luque} and Karel {Veselý} and Lukáš {Burget} and Jan {Černocký}", title="BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge", booktitle="Proceedings of IberSpeech 2022", year="2022", pages="276--280", publisher="International Speech Communication Association", address="Granada", doi="10.21437/IberSPEECH.2022-56", url="https://www.isca-speech.org/archive/pdfs/iberspeech_2022/kocour22_iberspeech.pdf" }