Detail publikace

Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint

PRASAD, A. CAROFILIS, A. VANDERREYDT, G. KHALIL, D. MADIKERI, S. MOTLÍČEK, P. SCHUEPBACH, C.

Originální název

Typ

článek ve sborníku ve WoS nebo Scopus

Jazyk

angličtina

Originální abstrakt

Self-supervised models trained with high linguistic diversity, such as the XLS-R model, can be effectively fine-tuned for the language recognition task. Typically, a back-end classifier followed by statistics pooling layer are added during train- ing. Commonly used back-end classifiers require a large num- ber of parameters to be trained, which is not ideal in limited data conditions. In this work, we explore smaller parame- ter back-ends using factorized Time Delay Neural Network (TDNN-F). The TDNN-F architecture is also integrated into Emphasized Channel Attention, Propagation and Aggregation- TDNN (ECAPA-TDNN) models, termed ECAPA-TDNN-F, reducing the number of parameters by 30 to 50% absolute, with competitive accuracies and no change in minimum cost. The results show that the ECAPA-TDNN-F can be extended to tasks where ECAPA-TDNN is suitable. We also test the effectiveness of a linear classifier and a variant, the Orthonor- mal linear classifier, previously used in x-vector type systems. The models are trained with NIST LRE17 data and evalu- ated on NIST LRE17, LRE22 and the ATCO2 LID datasets. Both linear classifiers outperform conventional back-ends with improvements in accuracy between 0.9% and 9.1%

Klíčová slova

Language Identification, Transformers, Wav2Vec2, fine-tuning, low-resource, out-of-domain,

Autoři

PRASAD, A.; CAROFILIS, A.; VANDERREYDT, G.; KHALIL, D.; MADIKERI, S.; MOTLÍČEK, P.; SCHUEPBACH, C.

Vydáno

14. 4. 2024

Nakladatel

IEEE Signal Processing Society

Místo

Seoul

ISBN

979-8-3503-4485-1

Kniha

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Strany od

11921

Strany do

11925

Strany počet

URL

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446751

BibTex

@inproceedings{BUT193354,
  author="PRASAD, A. and CAROFILIS, A. and VANDERREYDT, G. and KHALIL, D. and MADIKERI, S. and MOTLÍČEK, P. and SCHUEPBACH, C.",
  title="Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2024",
  pages="11921--11925",
  publisher="IEEE Signal Processing Society",
  address="Seoul",
  doi="10.1109/ICASSP48485.2024.10446751",
  isbn="979-8-3503-4485-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446751"
}

Dokumenty

prasad_icassp2024_fine-tuning.pdf

VUT

Fakulty

Vysokoškolské ústavy

Součásti

Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint