Project detail
Neural Representations in multi-modal and multi-lingual modeling
Duration: 1.1.2019 — 31.12.2023
Funding resources
Grantová agentura České republiky - Grantové projekty exelence v základním výzkumu EXPRO - 2019
On the project
The NEUREM3 project encompasses basic research in speech processing (SP) and natural language processing (NLP) with accent on multi-linguality and multi-modality (speech and text processing with the support of visual information). Current deep machine learning methods are based on continuous vector representations that are created by the neural networks (NN) themselves during the training. Although empirically, the results of such NNs are often excellent, our knowledge and understanding of such representations is insufficient. NEUREM3 has an ambition to fill this gap and to study neural representations for speech and text units of different scopes (from phonemes and letters to whole spoken and written documents) and representations acquired both for isolated tasks and multi-task setups. NEUREM3 will also improve NN architectures and training techniques, so that they can be trained on incomplete or incoherent data.
Description in Czech
Projekt NEUREM3 spojuje základní výzkum v oblasti zpracování mluvené řeči (speech
processing, SP) a přirozeného jazyka (natural language processing, NLP) s důrazem
na vícejazyčnost a multi-modalitu (zpracování řeči a textu s podporou obrazové
informace). V jádru současných metod hlubokého strojového učení leží spojité
vektorové reprezentace, které si neuronové samy budují během trénování. Ačkoli
empiricky dosahují neuronové sítě často vynikajících výsledků, znalosti
a pochopení získaných reprezentací jsou nedostatečné. NEUREM3 má ambici tuto
mezeru vyplnit a studovat neuronové reprezentace pro jednotky textu a řeči
různého rozsahu (od fonémů a písmen až po proslovy a dokumenty) a reprezentace
získané pro izolované úlohy i více úloh současně (multi-tasking). NEUREM3 vylepší
architektury i techniky trénování neuronových sítí, aby je bylo možné trénovat je
na neúplných nebo nekoherentních datech.
Keywords
deep learning;machine learning;neural networks;continuous representations;natural
language processing;speech and text processing;machine
translation;multi-modality;multi-linguality
Key words in Czech
hluboké strojové učení;neuronové sítě;spojité reprezentace;zpracování přirozeného
jazyka;zpracování řeči a textu;strojový překlad; multimodalita;mnohojazyčnost
Mark
GX19-26934X
Default language
English
People responsible
Burget Lukáš, doc. Ing., Ph.D. - principal person responsible
Baskar Murali Karthick, Ing., Ph.D. - fellow researcher
Beneš Karel, Ing. - fellow researcher
Han Jiangyu - fellow researcher
Kesiraju Santosh, Ph.D. - fellow researcher
Peng Junyi - fellow researcher
Plchot Oldřich, Ing., Ph.D. - fellow researcher
Rohdin Johan Andréas, M.Sc., Ph.D. - fellow researcher
Sarvaš Marek, Ing. - fellow researcher
Units
Department of Computer Graphics and Multimedia
- responsible department (8.6.2018 - not assigned)
Speech Data Mining Research Group BUT Speech@FIT
- internal (8.6.2018 - 31.12.2023)
Department of Computer Graphics and Multimedia
- beneficiary (8.6.2018 - 31.12.2023)
Results
LANDINI, F.; DIEZ SÁNCHEZ, M.; STAFYLAKIS, T.; BURGET, L. DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. IEEE Transactions on Audio, Speech, and Language Processing, 2024, vol. 32, no. 7, p. 3450-3465. ISSN: 1558-7916.
Detail
KLEMENT, D.; DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.; SILNOVA, A.; DELCROIX, M.; TAWARA, N. Discriminative Training of VBx Diarization. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 11871-11875. ISBN: 979-8-3503-4485-1.
Detail
PENG, J.; DELCROIX, M.; OCHIAI, T.; ASHIHARA, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. Probing Self-Supervised Learning Models With Target Speech Extraction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 535-539. ISBN: 979-8-3503-7451-3.
Detail
PENG, J.; DELCROIX, M.; OCHIAI, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. Target Speech Extraction with Pre-Trained Self-Supervised Learning Models. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 10421-10425. ISBN: 979-8-3503-4485-1.
Detail
HAN, J.; LANDINI, F.; ROHDIN, J.; DIEZ SÁNCHEZ, M.; BURGET, L.; CAO, Y.; LU, H.; ČERNOCKÝ, J. Diacorrect: Error Correction Back-End for Speaker Diarization. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul: IEEE Signal Processing Society, 2024. p. 11181-11185. ISBN: 979-8-3503-4485-1.
Detail
BENEŠ, K.; KOCOUR, M.; BURGET, L. Hystoc: Obtaining Word Confidences for Fusion of End-To-End ASR Systems. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 11276-11280. ISBN: 979-8-3503-4485-1.
Detail
PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Improving Speaker Verification with Self-Pretrained Transformer Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 5361-5365. ISSN: 1990-9772.
Detail
MATĚJKA, P.; SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; PLCHOT, O.; KLČO, M.; PENG, J.; STAFYLAKIS, T.; BURGET, L. Description and Analysis of ABC Submission to NIST LRE 2022. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 511-515. ISSN: 1990-9772.
Detail
DELCROIX, M.; TAWARA, N.; DIEZ SÁNCHEZ, M.; LANDINI, F.; SILNOVA, A.; OGAWA, A.; NAKATANI, T.; BURGET, L.; ARAKI, S. Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 3477-3481. ISSN: 1990-9772.
Detail
KESIRAJU, S.; SARVAŠ, M.; PAVLÍČEK, T.; MACAIRE, C.; CIUBA, A. Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 2148-2152. ISSN: 1990-9772.
Detail
MOŠNER, L.; PLCHOT, O.; PENG, J.; BURGET, L.; ČERNOCKÝ, J. Multi-Channel Speech Separation with Cross-Attention and Beamforming. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 1693-1697. ISSN: 1990-9772.
Detail
YU, D.; GONG, Y.; PICHENY, A.; RAMABHADRAN, B.; HAKKANI-TÜR, D.; PRASAD, R.; ZEN, H.; SKOGLUND, J.; ČERNOCKÝ, J.; BURGET, L.; MOHAMED, A. Twenty-Five Years of Evolution in Speech and Language Processing. IEEE SIGNAL PROCESSING MAGAZINE, 2023, vol. 40, no. 5, p. 27-39. ISSN: 1558-0792.
Detail
YUSUF, B.; ČERNOCKÝ, J.; SARAÇLAR, M. End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 2023, vol. 31, no. 08, p. 3070-3080. ISSN: 2329-9290.
Detail
KAKOUROS, S.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L. Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.
Detail
PENG, J.; STAFYLAKIS, T.; GU, R.; PLCHOT, O.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.
Detail
SILNOVA, A.; BRUMMER, J.; SWART, A.; BURGET, L. Toroidal Probabilistic Spherical Discriminant Analysis. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.
Detail
KESIRAJU, S.; BENEŠ, K.; TIKHONOV, M.; ČERNOCKÝ, J. BUT Systems for IWSLT 2023 Marathi - Hindi Low Resource Speech Translation Task. In 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference. Toronto (in-person and online): Association for Computational Linguistics, 2023. p. 227-234. ISBN: 978-1-959429-84-5.
Detail
LANDINI, F.; DIEZ SÁNCHEZ, M.; LOZANO DÍEZ, A.; BURGET, L. Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.
Detail
SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; KLČO, M.; PLCHOT, O.; MATĚJKA, P.; PENG, J.; STAFYLAKIS, T.; BURGET, L. ABC System Description for NIST LRE 2022. Proceedings of NIST LRE 2022 Workshop. Washington DC: National Institute of Standards and Technology, 2023. p. 1-5.
Detail
STAFYLAKIS, T.; MOŠNER, L.; KAKOUROS, S.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023. p. 1136-1143. ISBN: 978-1-6654-7189-3.
Detail
Responsibility: Burget Lukáš, doc. Ing., Ph.D.