Publication detail

Analysis of X-Vectors for Low-Resource Speech Recognition

KARAFIÁT, M. VESELÝ, K. ČERNOCKÝ, J. PROFANT, J. NYTRA, J. HLAVÁČEK, M. PAVLÍČEK, T.

Original Title

Type

conference paper

Language

English

Original Abstract

The paper presents a study of usability of x-vectors for adaptation of automatic speech recognition (ASR) systems. Xvectors are Neural Network (NN)-based speaker embeddings recently proposed in speaker recognition (SR). They quickly replaced common i-vectors and became new state-of-the-art technique. Here, the same approach is adopted for ASR with the hope of similar outcome. All experiments were done on ASR for the latest IARPA MATERIAL evaluation running on Pashto language. Over 1% absolute improvement was observed with x-vectors over traditional i-vectors, even when the x-vector extractor was not trained on target Pashto data.

Keywords

speech recognition, adaptation, x-vectors, data augmentation, robustness

Authors

KARAFIÁT, M.; VESELÝ, K.; ČERNOCKÝ, J.; PROFANT, J.; NYTRA, J.; HLAVÁČEK, M.; PAVLÍČEK, T.

Released

6. 6. 2021

Publisher

IEEE Signal Processing Society

Location

Toronto, Ontario

ISBN

978-1-7281-7605-5

Book

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pages from

6998

Pages to

7002

Pages count

URL

https://www.fit.vut.cz/research/publication/12525/

BibTex

@inproceedings{BUT175794,
  author="KARAFIÁT, M. and VESELÝ, K. and ČERNOCKÝ, J. and PROFANT, J. and NYTRA, J. and HLAVÁČEK, M. and PAVLÍČEK, T.",
  title="Analysis of X-Vectors for Low-Resource Speech Recognition",
  booktitle="ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
  year="2021",
  pages="6998--7002",
  publisher="IEEE Signal Processing Society",
  address="Toronto, Ontario",
  doi="10.1109/ICASSP39728.2021.9414725",
  isbn="978-1-7281-7605-5",
  url="https://www.fit.vut.cz/research/publication/12525/"
}

Documents

karafiat_icassp2021_09414725.pdf

VUT

Faculties

University Institutes

Parts

Analysis of X-Vectors for Low-Resource Speech Recognition