Publication detail

Advancing speaker embedding learning: Wespeaker toolkit for research and production

WANG, S. CHEN, Z. HAN, B. WANG, H. XIANG, X. ROHDIN, J. SILNOVA, A. QIAN, Y. LI, H.

Original Title

Advancing speaker embedding learning: Wespeaker toolkit for research and production

Type

journal article in Web of Science

Language

English

Original Abstract

Speaker modeling plays a crucial role in various tasks, and fixed-dimensional vector representations, known as speaker embeddings, are the predominant modeling approach. These embeddings are typically evaluated within the framework of speaker verification, yet their utility extends to a broad scope of related tasks including speaker diarization, speech synthesis, voice conversion, and target speaker extraction. This paper presents Wespeaker, a user-friendly toolkit designed for both research and production purposes, dedicated to the learning of speaker embeddings. Wespeaker offers scalable data management, state-of-the-art speaker embedding models, and self-supervised learning training schemes with the potential to leverage large-scale unlabeled real-world data. The toolkit incorporates structured recipes that have been successfully adopted in winning systems across various speaker verification challenges, ensuring highly competitive results. For production-oriented development, Wespeaker integrates CPU- and GPU-compatible deployment and runtime codes, supporting mainstream platforms such as Windows, Linux, Mac and on-device chips such as horizon X3'PI. Wespeaker also provides off-the-shelf high-quality speaker embeddings by providing various pretrained models, which can be effortlessly applied to different tasks that require speaker modeling. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.

Keywords

Wespeaker; Speaker embedding learning; SSL; Open-source

Authors

WANG, S.; CHEN, Z.; HAN, B.; WANG, H.; XIANG, X.; ROHDIN, J.; SILNOVA, A.; QIAN, Y.; LI, H.

Released

1. 7. 2024

ISBN

0167-6393

Periodical

Speech Communication

Year of study

162

Number

103104

State

Kingdom of the Netherlands

Pages from

1

Pages to

12

Pages count

12

URL

BibTex

@article{BUT193986,
  author="WANG, S. and CHEN, Z. and HAN, B. and WANG, H. and XIANG, X. and ROHDIN, J. and SILNOVA, A. and QIAN, Y. and LI, H.",
  title="Advancing speaker embedding learning: Wespeaker toolkit for research and production",
  journal="Speech Communication",
  year="2024",
  volume="162",
  number="103104",
  pages="1--12",
  doi="10.1016/j.specom.2024.103104",
  issn="0167-6393",
  url="https://pdf.sciencedirectassets.com/271578/1-s2.0-S0167639324X00060/1-s2.0-S0167639324000761/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAsaCXVzLWVhc3QtMSJIMEYCIQC8Doe66%2Bu6V%2FODd2NY6EZwVTEeN05avzWi09%2FPx3ob%2FQIhAP%2BOyz3L2hXSsDYY4l3zSuz1pzOjFiaTh%"
}

Documents