Detail projektu
Neural Representations in multi-modal and multi-lingual modeling
Období řešení: 1.1.2019 — 31.12.2023
Zdroje financování
Grantová agentura České republiky - Grantové projekty exelence v základním výzkumu EXPRO - 2019
O projektu
The NEUREM3 project encompasses basic research in speech processing (SP) and natural language processing (NLP) with accent on multi-linguality and multi-modality (speech and text processing with the support of visual information). Current deep machine learning methods are based on continuous vector representations that are created by the neural networks (NN) themselves during the training. Although empirically, the results of such NNs are often excellent, our knowledge and understanding of such representations is insufficient. NEUREM3 has an ambition to fill this gap and to study neural representations for speech and text units of different scopes (from phonemes and letters to whole spoken and written documents) and representations acquired both for isolated tasks and multi-task setups. NEUREM3 will also improve NN architectures and training techniques, so that they can be trained on incomplete or incoherent data.
Popis česky
Projekt NEUREM3 spojuje základní výzkum v oblasti zpracování mluvené řeči (speech
processing, SP) a přirozeného jazyka (natural language processing, NLP) s důrazem
na vícejazyčnost a multi-modalitu (zpracování řeči a textu s podporou obrazové
informace). V jádru současných metod hlubokého strojového učení leží spojité
vektorové reprezentace, které si neuronové samy budují během trénování. Ačkoli
empiricky dosahují neuronové sítě často vynikajících výsledků, znalosti
a pochopení získaných reprezentací jsou nedostatečné. NEUREM3 má ambici tuto
mezeru vyplnit a studovat neuronové reprezentace pro jednotky textu a řeči
různého rozsahu (od fonémů a písmen až po proslovy a dokumenty) a reprezentace
získané pro izolované úlohy i více úloh současně (multi-tasking). NEUREM3 vylepší
architektury i techniky trénování neuronových sítí, aby je bylo možné trénovat je
na neúplných nebo nekoherentních datech.
Klíčová slova
deep learning;machine learning;neural networks;continuous representations;natural
language processing;speech and text processing;machine
translation;multi-modality;multi-linguality
Klíčová slova česky
hluboké strojové učení;neuronové sítě;spojité reprezentace;zpracování přirozeného
jazyka;zpracování řeči a textu;strojový překlad; multimodalita;mnohojazyčnost
Označení
GX19-26934X
Originální jazyk
angličtina
Řešitelé
Burget Lukáš, doc. Ing., Ph.D. - hlavní řešitel
Baskar Murali Karthick, Ing., Ph.D. - spoluřešitel
Beneš Karel, Ing. - spoluřešitel
Han Jiangyu - spoluřešitel
Kesiraju Santosh, Ph.D. - spoluřešitel
Peng Junyi - spoluřešitel
Plchot Oldřich, Ing., Ph.D. - spoluřešitel
Rohdin Johan Andréas, M.Sc., Ph.D. - spoluřešitel
Sarvaš Marek, Ing. - spoluřešitel
Útvary
Ústav počítačové grafiky a multimédií
- odpovědné pracoviště (8.6.2018 - nezadáno)
Výzkumná skupina dolování dat z řeči BUT Speech@FIT
- interní (8.6.2018 - 31.12.2023)
Univerzita Karlova v Praze
- spolupříjemce (8.6.2018 - 31.12.2023)
Ústav počítačové grafiky a multimédií
- příjemce (8.6.2018 - 31.12.2023)
Výsledky
LANDINI, F.; DIEZ SÁNCHEZ, M.; STAFYLAKIS, T.; BURGET, L. DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. IEEE Transactions on Audio, Speech, and Language Processing, 2024, vol. 32, no. 7, p. 3450-3465. ISSN: 1558-7916.
Detail
KLEMENT, D.; DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.; SILNOVA, A.; DELCROIX, M.; TAWARA, N. Discriminative Training of VBx Diarization. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 11871-11875. ISBN: 979-8-3503-4485-1.
Detail
PENG, J.; DELCROIX, M.; OCHIAI, T.; ASHIHARA, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. Probing Self-Supervised Learning Models With Target Speech Extraction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 535-539. ISBN: 979-8-3503-7451-3.
Detail
PENG, J.; DELCROIX, M.; OCHIAI, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. Target Speech Extraction with Pre-Trained Self-Supervised Learning Models. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 10421-10425. ISBN: 979-8-3503-4485-1.
Detail
HAN, J.; LANDINI, F.; ROHDIN, J.; DIEZ SÁNCHEZ, M.; BURGET, L.; CAO, Y.; LU, H.; ČERNOCKÝ, J. Diacorrect: Error Correction Back-End for Speaker Diarization. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul: IEEE Signal Processing Society, 2024. p. 11181-11185. ISBN: 979-8-3503-4485-1.
Detail
BENEŠ, K.; KOCOUR, M.; BURGET, L. Hystoc: Obtaining Word Confidences for Fusion of End-To-End ASR Systems. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 11276-11280. ISBN: 979-8-3503-4485-1.
Detail
PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Improving Speaker Verification with Self-Pretrained Transformer Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 5361-5365. ISSN: 1990-9772.
Detail
MATĚJKA, P.; SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; PLCHOT, O.; KLČO, M.; PENG, J.; STAFYLAKIS, T.; BURGET, L. Description and Analysis of ABC Submission to NIST LRE 2022. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 511-515. ISSN: 1990-9772.
Detail
DELCROIX, M.; TAWARA, N.; DIEZ SÁNCHEZ, M.; LANDINI, F.; SILNOVA, A.; OGAWA, A.; NAKATANI, T.; BURGET, L.; ARAKI, S. Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 3477-3481. ISSN: 1990-9772.
Detail
KESIRAJU, S.; SARVAŠ, M.; PAVLÍČEK, T.; MACAIRE, C.; CIUBA, A. Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 2148-2152. ISSN: 1990-9772.
Detail
MOŠNER, L.; PLCHOT, O.; PENG, J.; BURGET, L.; ČERNOCKÝ, J. Multi-Channel Speech Separation with Cross-Attention and Beamforming. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 1693-1697. ISSN: 1990-9772.
Detail
YU, D.; GONG, Y.; PICHENY, A.; RAMABHADRAN, B.; HAKKANI-TÜR, D.; PRASAD, R.; ZEN, H.; SKOGLUND, J.; ČERNOCKÝ, J.; BURGET, L.; MOHAMED, A. Twenty-Five Years of Evolution in Speech and Language Processing. IEEE SIGNAL PROCESSING MAGAZINE, 2023, vol. 40, no. 5, p. 27-39. ISSN: 1558-0792.
Detail
YUSUF, B.; ČERNOCKÝ, J.; SARAÇLAR, M. End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 2023, vol. 31, no. 08, p. 3070-3080. ISSN: 2329-9290.
Detail
KAKOUROS, S.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L. Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.
Detail
PENG, J.; STAFYLAKIS, T.; GU, R.; PLCHOT, O.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.
Detail
SILNOVA, A.; BRUMMER, J.; SWART, A.; BURGET, L. Toroidal Probabilistic Spherical Discriminant Analysis. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.
Detail
KESIRAJU, S.; BENEŠ, K.; TIKHONOV, M.; ČERNOCKÝ, J. BUT Systems for IWSLT 2023 Marathi - Hindi Low Resource Speech Translation Task. In 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference. Toronto (in-person and online): Association for Computational Linguistics, 2023. p. 227-234. ISBN: 978-1-959429-84-5.
Detail
LANDINI, F.; DIEZ SÁNCHEZ, M.; LOZANO DÍEZ, A.; BURGET, L. Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.
Detail
SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; KLČO, M.; PLCHOT, O.; MATĚJKA, P.; PENG, J.; STAFYLAKIS, T.; BURGET, L. ABC System Description for NIST LRE 2022. Proceedings of NIST LRE 2022 Workshop. Washington DC: National Institute of Standards and Technology, 2023. p. 1-5.
Detail
STAFYLAKIS, T.; MOŠNER, L.; KAKOUROS, S.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023. p. 1136-1143. ISBN: 978-1-6654-7189-3.
Detail
Odpovědnost: Burget Lukáš, doc. Ing., Ph.D.