Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
KOCOUR, M. CÁMBARA, G. LUQUE, J. BONET, D. FARRÚS, M. KARAFIÁT, M. VESELÝ, K. ČERNOCKÝ, J.
Originální název
BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge
Typ
článek ve sborníku mimo WoS a Scopus
Jazyk
angličtina
Originální abstrakt
This paper describes the joint effort of BUT and Telefónica Research on the development of Automatic Speech Recognition systems for the Albayzin 2020 Challenge. We compare approaches based on either hybrid or end-to-end models. In hybrid modelling, we explore the impact of a SpecAugment layer on performance. For end-to-end modelling, we used a convolutional neural network with gated linear units (GLUs). The performance of such model is also evaluated with an additional n-gram language model to improve word error rates. We further inspect source separation methods to extract speech from noisy environments (i.e. TV shows). More precisely, we assess the effect of using a neural-based music separator named Demucs. A fusion of our best systems achieved 23.33% WER in official Albayzin 2020 evaluations. Aside from techniques used in our final submitted systems, we also describe our efforts in retrieving high-quality transcripts for training.
Klíčová slova
fusion, end-to-end model, hybrid model, semisupervised, automatic speech recognition, convolutional neural network.
Autoři
KOCOUR, M.; CÁMBARA, G.; LUQUE, J.; BONET, D.; FARRÚS, M.; KARAFIÁT, M.; VESELÝ, K.; ČERNOCKÝ, J.
Vydáno
24. 3. 2021
Nakladatel
International Speech Communication Association
Místo
Vallaloid
Strany od
113
Strany do
117
Strany počet
5
URL
https://www.isca-speech.org/archive/iberspeech_2021/kocour21_iberspeech.html
BibTex
@inproceedings{BUT175823, author="KOCOUR, M. and CÁMBARA, G. and LUQUE, J. and BONET, D. and FARRÚS, M. and KARAFIÁT, M. and VESELÝ, K. and ČERNOCKÝ, J.", title="BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge", booktitle="Proceedings of IberSPEECH 2021", year="2021", pages="113--117", publisher="International Speech Communication Association", address="Vallaloid", doi="10.21437/IberSPEECH.2021-24", url="https://www.isca-speech.org/archive/iberspeech_2021/kocour21_iberspeech.html" }