Detail publikace
BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge
POLOK, A. KLEMENT, D. HAN, J. SEDLÁČEK, Š. YUSUF, B. MACIEJEWSKI, M. WIESNER, M. BURGET, L.
Originální název
BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge
Typ
článek ve sborníku mimo WoS a Scopus
Jazyk
angličtina
Originální abstrakt
This paper presents our method for tackling the CHIME-8 chal- lenge's NOTSOFAR-1 task, which requires participants to per- form multi-speaker automatic speech recognition (ASR) using audio from distant microphone arrays. We modify the Pyan- note3 diarization pipeline, incorporating pre-trained WavLM as local EEND to adapt effectively to new domains, and we intro- duce two diarization-aware approaches to ASR by condition- ing Whisper on diarization outputs for target-speaker ASR. The first method, which we refer to as Query-Key Biasing, modi- fies Whisper's attention mechanism and positional embeddings with a learnable attention mask to exclude non-target speaker segments in the audio. The second method, called Frame- Level Diarization-Dependent Transformations, applies affine, diarization-dependent transformations with trainable parame- ters to the inputs of one or more transformer blocks. We also extend both the ASR and diarization systems to a multichannel setup by incorporating cross-channel communication into our models. Finally, we report the performance of these approaches on the NOTSOFAR-1 dataset.
Klíčová slova
multi-talker speech recognition, CHiME-8, NOTSOFAR-1, target-speaker
Autoři
POLOK, A.; KLEMENT, D.; HAN, J.; SEDLÁČEK, Š.; YUSUF, B.; MACIEJEWSKI, M.; WIESNER, M.; BURGET, L.
Vydáno
6. 9. 2024
Nakladatel
International Speech Communication Association
Místo
Kos Island
Strany od
18
Strany do
22
Strany počet
5
URL
BibTex
@inproceedings{BUT194002,
author="Alexander {Polok} and Dominik {Klement} and Jiangyu {Han} and Šimon {Sedláček} and Bolaji {Yusuf} and Matthew {Maciejewski} and Matthew {Wiesner} and Lukáš {Burget}",
title="BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge",
booktitle="Proceedings of CHiME 2024 Workshop",
year="2024",
pages="18--22",
publisher="International Speech Communication Association",
address="Kos Island",
doi="10.21437/CHiME.2024-4",
url="https://www.isca-archive.org/chime_2024/polok24_chime.pdf"
}
Dokumenty