Přístupnostní navigace
E-application
Search Search Close
Publication detail
BENEŠ, K. BURGET, L.
Original Title
Text Augmentation for Language Models in High Error Recognition Scenario
Type
conference paper
Language
English
Original Abstract
In this paper, we explore several data augmentation strategies for training of language models for speech recognition. We compare augmentation based on global error statistics with one based on unigram statistics of ASR errors and with labelsmoothing and its sampled variant. Additionally, we investigate the stability and the predictive power of perplexity estimated on augmented data. Despite being trivial, augmentation driven by global substitution, deletion and insertion rates achieves the best rescoring results. On the other hand, even though the associated perplexity measure is stable, it gives no better prediction of the final error rate than the vanilla one. Our best augmentation scheme increases the WER improvement from second-pass rescoring from 1.1% to 1.9% absolute on the CHiMe-6 challenge.
Keywords
data augmentation, error simulation, language modeling, automatic speech recognition
Authors
BENEŠ, K.; BURGET, L.
Released
30. 8. 2021
Publisher
International Speech Communication Association
Location
Brno
ISBN
1990-9772
Periodical
Proceedings of Interspeech
Year of study
2021
Number
8
State
French Republic
Pages from
1872
Pages to
1876
Pages count
5
URL
https://www.isca-speech.org/archive/interspeech_2021/benes21_interspeech.html
BibTex
@inproceedings{BUT175841, author="Karel {Beneš} and Lukáš {Burget}", title="Text Augmentation for Language Models in High Error Recognition Scenario", booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH", year="2021", journal="Proceedings of Interspeech", volume="2021", number="8", pages="1872--1876", publisher="International Speech Communication Association", address="Brno", doi="10.21437/Interspeech.2021-627", issn="1990-9772", url="https://www.isca-speech.org/archive/interspeech_2021/benes21_interspeech.html" }
Documents
benes21_interspeech.pdf