Publication detail
Advancing speaker embedding learning: Wespeaker toolkit for research and production
WANG, S. CHEN, Z. HAN, B. WANG, H. XIANG, X. ROHDIN, J. SILNOVA, A. QIAN, Y. LI, H.
Original Title
Advancing speaker embedding learning: Wespeaker toolkit for research and production
Type
journal article in Web of Science
Language
English
Original Abstract
Speaker modeling plays a crucial role in various tasks, and fixed-dimensional vector representations, known as speaker embeddings, are the predominant modeling approach. These embeddings are typically evaluated within the framework of speaker verification, yet their utility extends to a broad scope of related tasks including speaker diarization, speech synthesis, voice conversion, and target speaker extraction. This paper presents Wespeaker, a user-friendly toolkit designed for both research and production purposes, dedicated to the learning of speaker embeddings. Wespeaker offers scalable data management, state-of-the-art speaker embedding models, and self-supervised learning training schemes with the potential to leverage large-scale unlabeled real-world data. The toolkit incorporates structured recipes that have been successfully adopted in winning systems across various speaker verification challenges, ensuring highly competitive results. For production-oriented development, Wespeaker integrates CPU- and GPU-compatible deployment and runtime codes, supporting mainstream platforms such as Windows, Linux, Mac and on-device chips such as horizon X3'PI. Wespeaker also provides off-the-shelf high-quality speaker embeddings by providing various pretrained models, which can be effortlessly applied to different tasks that require speaker modeling. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.
Keywords
Wespeaker; Speaker embedding learning; SSL; Open-source
Authors
WANG, S.; CHEN, Z.; HAN, B.; WANG, H.; XIANG, X.; ROHDIN, J.; SILNOVA, A.; QIAN, Y.; LI, H.
Released
1. 7. 2024
ISBN
0167-6393
Periodical
Speech Communication
Year of study
162
Number
103104
State
Kingdom of the Netherlands
Pages from
1
Pages to
12
Pages count
12
URL
BibTex
@article{BUT193986,
author="WANG, S. and CHEN, Z. and HAN, B. and WANG, H. and XIANG, X. and ROHDIN, J. and SILNOVA, A. and QIAN, Y. and LI, H.",
title="Advancing speaker embedding learning: Wespeaker toolkit for research and production",
journal="Speech Communication",
year="2024",
volume="162",
number="103104",
pages="1--12",
doi="10.1016/j.specom.2024.103104",
issn="0167-6393",
url="https://pdf.sciencedirectassets.com/271578/1-s2.0-S0167639324X00060/1-s2.0-S0167639324000761/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAsaCXVzLWVhc3QtMSJIMEYCIQC8Doe66%2Bu6V%2FODd2NY6EZwVTEeN05avzWi09%2FPx3ob%2FQIhAP%2BOyz3L2hXSsDYY4l3zSuz1pzOjFiaTh%"
}
Documents