Publication detail

Text Augmentation for Language Models in High Error Recognition Scenario

BENEŠ, K. BURGET, L.

Original Title

Text Augmentation for Language Models in High Error Recognition Scenario

Type

conference paper

Language

English

Original Abstract

In this paper, we explore several data augmentation strategiesfor training of language models for speech recognition. Wecompare augmentation based on global error statistics withone based on unigram statistics of ASR errors and with labelsmoothingand its sampled variant. Additionally, we investigatethe stability and the predictive power of perplexity estimatedon augmented data. Despite being trivial, augmentation drivenby global substitution, deletion and insertion rates achieves thebest rescoring results. On the other hand, even though the associatedperplexity measure is stable, it gives no better predictionof the final error rate than the vanilla one. Our best augmentationscheme increases the WER improvement from second-passrescoring from 1.1% to 1.9% absolute on the CHiMe-6 challenge.

Keywords

data augmentation, error simulation, languagemodeling, automatic speech recognition

Authors

BENEŠ, K.; BURGET, L.

Released

30. 8. 2021

Publisher

International Speech Communication Association

Location

Brno

ISBN

1990-9772

Periodical

Proceedings of Interspeech

Year of study

2021

Number

8

State

French Republic

Pages from

1872

Pages to

1876

Pages count

5

URL

BibTex

@inproceedings{BUT175841,
  author="Karel {Beneš} and Lukáš {Burget}",
  title="Text Augmentation for Language Models in High Error Recognition Scenario",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2021",
  journal="Proceedings of Interspeech",
  volume="2021",
  number="8",
  pages="1872--1876",
  publisher="International Speech Communication Association",
  address="Brno",
  doi="10.21437/Interspeech.2021-627",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/interspeech_2021/benes21_interspeech.html"
}

Documents