Detail publikace

Improving Speaker Verification with Self-Pretrained Transformer Models

PENG, J. PLCHOT, O. STAFYLAKIS, T. MOŠNER, L. BURGET, L. ČERNOCKÝ, J.

Originální název

Typ

článek ve sborníku ve WoS nebo Scopus

Jazyk

angličtina

Originální abstrakt

Recently, fine-tuning large pre-trained Transformer models using downstream datasets has received a rising interest. Despite their success, it is still challenging to disentangle the benefits of large-scale datasets and Transformer structures from the limitations of the pre-training. In this paper, we introduce a hierarchical training approach, named self-pretraining, in which Transformer models are pretrained and finetuned on the same dataset. Three pre-trained models including HuBERT, Conformer andWavLM are evaluated on four different speaker verification datasets with varying sizes. Our experiments show that these self-pretrained models achieve competitive performance on downstream speaker verification tasks with only one-third of the data compared to Librispeech pretraining, such as Vox- Celeb1 and CNCeleb1. Furthermore, when pre-training only on the VoxCeleb2-dev, the Conformer model outperforms the one pre-trained on 94k hours of data using the same fine-tuning settings.

Klíčová slova

speaker verification, pre-trained speech transformer model, pre-training,

Autoři

PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J.

Vydáno

20. 8. 2023

Nakladatel

International Speech Communication Association

Místo

Dublin

ISSN

1990-9772

Periodikum

Proceedings of Interspeech

Ročník

2023

Číslo

Stát

Francouzská republika

Strany od

5361

Strany do

5365

Strany počet

URL

https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf

BibTex

@inproceedings{BUT185575,
  author="Junyi {Peng} and Oldřich {Plchot} and Themos {Stafylakis} and Ladislav {Mošner} and Lukáš {Burget} and Jan {Černocký}",
  title="Improving Speaker Verification with Self-Pretrained Transformer Models",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="5361--5365",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-453",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf"
}

Dokumenty

peng23_interspeech2023_improving.pdf

VUT

Fakulty

Vysokoškolské ústavy

Součásti

Improving Speaker Verification with Self-Pretrained Transformer Models