Detail publikace

Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models

KESIRAJU, S. SARVAŠ, M. PAVLÍČEK, T. MACAIRE, C. CIUBA, A.

Originální název

Typ

článek ve sborníku ve WoS nebo Scopus

Jazyk

angličtina

Originální abstrakt

This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST). We conducted experiments on both simulated and reallow resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively. Using the encoder-decoder framework for ST, our results show that a multilingual automatic speech recognition system acts as a good initialization under low-resource scenarios. Furthermore, using the CTC as an additional objective for translation during training and decoding helps to reorder the internal representations and improves the final translation. Through our experiments, we try to identify various factors (initializations, objectives, and hyperparameters) that contribute the most for improvements in lowresource setups. With only 300 hours of pre-training data, our model achieved 7.3 BLEU score on Tamasheq - French data, outperforming prior published works from IWSLT 2022 by 1.6 points.

Klíčová slova

speech translation, low-resource, multilingual, speech recognition

Autoři

KESIRAJU, S.; SARVAŠ, M.; PAVLÍČEK, T.; MACAIRE, C.; CIUBA, A.

Vydáno

20. 8. 2023

Nakladatel

International Speech Communication Association

Místo

Dublin

ISSN

1990-9772

Periodikum

Proceedings of Interspeech

Ročník

2023

Číslo

Stát

Francouzská republika

Strany od

2148

Strany do

2152

Strany počet

URL

https://www.isca-speech.org/archive/pdfs/interspeech_2023/kesiraju23_interspeech.pdf

BibTex

@inproceedings{BUT185572,
  author="KESIRAJU, S. and SARVAŠ, M. and PAVLÍČEK, T. and MACAIRE, C. and CIUBA, A.",
  title="Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="2148--2152",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-2506",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/kesiraju23_interspeech.pdf"
}

Dokumenty

kesiraju23_interspeech2023_strategies.pdf

VUT

Fakulty

Vysokoškolské ústavy

Součásti

Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models