Přístupnostní navigace
E-application
Search Search Close
Publication detail
ZULUAGA-GOMEZ, J. PRASAD, A. NIGMATULINA, I. SARFJOO, S. MOTLÍČEK, P. KLEINERT, M. HELMKE, H. OHNEISER, O. ZHAN, Q.
Original Title
How Does Pre-Trained Wav2Vec 2.0 Perform on Domain-Shifted ASR? an Extensive Benchmark on Air Traffic Control Communications
Type
conference paper
Language
English
Original Abstract
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data properties substantially differ between the pre-training and fine-tuning phases, termed domain shift. We target this scenario by analyzing the robustness of Wav2Vec 2.0 and XLS-R models on downstream ASR for a completely unseen domain, air traffic control (ATC) communications. We benchmark these two models on several open-source and challenging ATC databases with signal-to-noise ratio between 5 to 20 dB. Relative word error rate (WER) reductions between 20% to 40% are obtained in comparison to hybrid-based ASR baselines by only fine-tuning E2E acoustic models with a smaller fraction of labeled data. We analyze WERs on the low-resource scenario and gender bias carried by one ATC dataset.
Keywords
Automatic speech recognition, Wav2Vec 2.0, self-supervised pre-training, air traffic control communications.
Authors
ZULUAGA-GOMEZ, J.; PRASAD, A.; NIGMATULINA, I.; SARFJOO, S.; MOTLÍČEK, P.; KLEINERT, M.; HELMKE, H.; OHNEISER, O.; ZHAN, Q.
Released
9. 1. 2023
Publisher
IEEE Signal Processing Society
Location
Doha
ISBN
978-1-6654-7189-3
Book
IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
Pages from
205
Pages to
212
Pages count
8
URL
https://ieeexplore.ieee.org/document/10022724
BibTex
@inproceedings{BUT185194, author="ZULUAGA-GOMEZ, J. and PRASAD, A. and NIGMATULINA, I. and SARFJOO, S. and MOTLÍČEK, P. and KLEINERT, M. and HELMKE, H. and OHNEISER, O. and ZHAN, Q.", title="How Does Pre-Trained Wav2Vec 2.0 Perform on Domain-Shifted ASR? an Extensive Benchmark on Air Traffic Control Communications", booktitle="IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings", year="2023", pages="205--212", publisher="IEEE Signal Processing Society", address="Doha", doi="10.1109/SLT54892.2023.10022724", isbn="978-1-6654-7189-3", url="https://ieeexplore.ieee.org/document/10022724" }
Documents
zulaga-gomez_amrutha prasad_slt_2023_10022724.pdf