Detail publikace

BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge

POLOK, A. KLEMENT, D. HAN, J. SEDLÁČEK, Š. YUSUF, B. MACIEJEWSKI, M. WIESNER, M. BURGET, L.

Originální název

BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge

Typ

článek ve sborníku mimo WoS a Scopus

Jazyk

angličtina

Originální abstrakt

This paper presents our method for tackling the CHIME-8 chal- lenge's NOTSOFAR-1 task, which requires participants to per- form multi-speaker automatic speech recognition (ASR) using audio from distant microphone arrays. We modify the Pyan- note3 diarization pipeline, incorporating pre-trained WavLM as local EEND to adapt effectively to new domains, and we intro- duce two diarization-aware approaches to ASR by condition- ing Whisper on diarization outputs for target-speaker ASR. The first method, which we refer to as Query-Key Biasing, modi- fies Whisper's attention mechanism and positional embeddings with a learnable attention mask to exclude non-target speaker segments in the audio. The second method, called Frame- Level Diarization-Dependent Transformations, applies affine, diarization-dependent transformations with trainable parame- ters to the inputs of one or more transformer blocks. We also extend both the ASR and diarization systems to a multichannel setup by incorporating cross-channel communication into our models. Finally, we report the performance of these approaches on the NOTSOFAR-1 dataset.

Klíčová slova

multi-talker speech recognition, CHiME-8, NOTSOFAR-1, target-speaker

Autoři

POLOK, A.; KLEMENT, D.; HAN, J.; SEDLÁČEK, Š.; YUSUF, B.; MACIEJEWSKI, M.; WIESNER, M.; BURGET, L.

Vydáno

6. 9. 2024

Nakladatel

International Speech Communication Association

Místo

Kos Island

Strany od

18

Strany do

22

Strany počet

5

URL

BibTex

@inproceedings{BUT194002,
  author="Alexander {Polok} and Dominik {Klement} and Jiangyu {Han} and Šimon {Sedláček} and Bolaji {Yusuf} and Matthew {Maciejewski} and Matthew {Wiesner} and Lukáš {Burget}",
  title="BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge",
  booktitle="Proceedings of CHiME 2024 Workshop",
  year="2024",
  pages="18--22",
  publisher="International Speech Communication Association",
  address="Kos Island",
  doi="10.21437/CHiME.2024-4",
  url="https://www.isca-archive.org/chime_2024/polok24_chime.pdf"
}

Dokumenty