Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail projektu
Období řešení: 01.01.2019 — 31.12.2023
Zdroje financování
Grantová agentura České republiky - Grantové projekty exelence v základním výzkumu EXPRO - 2019
- částečně financující (2019-01-01 - 2023-12-31)
O projektu
The NEUREM3 project encompasses basic research in speech processing (SP) and natural language processing (NLP) with accent on multi-linguality and multi-modality (speech and text processing with the support of visual information). Current deep machine learning methods are based on continuous vector representations that are created by the neural networks (NN) themselves during the training. Although empirically, the results of such NNs are often excellent, our knowledge and understanding of such representations is insufficient. NEUREM3 has an ambition to fill this gap and to study neural representations for speech and text units of different scopes (from phonemes and letters to whole spoken and written documents) and representations acquired both for isolated tasks and multi-task setups. NEUREM3 will also improve NN architectures and training techniques, so that they can be trained on incomplete or incoherent data.
Popis českyProjekt NEUREM3 spojuje základní výzkum v oblasti zpracování mluvené řeči (speech processing, SP) a přirozeného jazyka (natural language processing, NLP) s důrazem na vícejazyčnost a multi-modalitu (zpracování řeči a textu s podporou obrazové informace). V jádru současných metod hlubokého strojového učení leží spojité vektorové reprezentace, které si neuronové samy budují během trénování. Ačkoli empiricky dosahují neuronové sítě často vynikajících výsledků, znalosti a pochopení získaných reprezentací jsou nedostatečné. NEUREM3 má ambici tuto mezeru vyplnit a studovat neuronové reprezentace pro jednotky textu a řeči různého rozsahu (od fonémů a písmen až po proslovy a dokumenty) a reprezentace získané pro izolované úlohy i více úloh současně (multi-tasking). NEUREM3 vylepší architektury i techniky trénování neuronových sítí, aby je bylo možné trénovat je na neúplných nebo nekoherentních datech.
Klíčová slovadeep learning;machine learning;neural networks;continuous representations;natural language processing;speech and text processing;machine translation;multi-modality;multi-linguality
Klíčová slova českyhluboké strojové učení;neuronové sítě;spojité reprezentace;zpracování přirozeného jazyka;zpracování řeči a textu;strojový překlad; multimodalita;mnohojazyčnost
Označení
GX19-26934X
Originální jazyk
angličtina
Řešitelé
Burget Lukáš, doc. Ing., Ph.D. - hlavní řešitelBaskar Murali Karthick, Ing., Ph.D. - spoluřešitelBeneš Karel, Ing. - spoluřešitel
Útvary
Ústav počítačové grafiky a multimédií- příjemce (08.06.2018 - 31.12.2023)Univerzita Karlova v Praze- spolupříjemce (08.06.2018 - 31.12.2023)
Výsledky
ROHDIN, J.; SILNOVA, A.; DIEZ SÁNCHEZ, M.; PLCHOT, O.; MATĚJKA, P.; BURGET, L.; GLEMBEK, O. End-to-end DNN based text-independent speaker recognition for long and short utterances. COMPUTER SPEECH AND LANGUAGE, 2020, vol. 2020, no. 59, p. 22-35. ISSN: 0885-2308.Detail
NOVOTNÝ, O.; PLCHOT, O.; GLEMBEK, O.; ČERNOCKÝ, J.; BURGET, L. Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition. COMPUTER SPEECH AND LANGUAGE, 2019, vol. 2019, no. 58, p. 403-421. ISSN: 0885-2308.Detail
ŽMOLÍKOVÁ, K.; DELCROIX, M.; KINOSHITA, K.; OCHIAI, T.; NAKATANI, T.; BURGET, L.; ČERNOCKÝ, J. SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures. IEEE J-STSP, 2019, vol. 13, no. 4, p. 800-814. ISSN: 1932-4553.Detail
ONDEL YANG, L.; VYDANA, H.; BURGET, L.; ČERNOCKÝ, J. Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery. In Proceedings of Interspeech 2019. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. p. 261-265. ISSN: 1990-9772.Detail
DIEZ SÁNCHEZ, M.; BURGET, L.; WANG, S.; ROHDIN, J.; ČERNOCKÝ, J. Bayesian HMM based x-vector clustering for Speaker Diarization. In Proceedings of Interspeech. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. p. 346-350. ISSN: 1990-9772.Detail
WANG, S.; ROHDIN, J.; BURGET, L.; PLCHOT, O.; QIAN, Y.; YU, K.; ČERNOCKÝ, J. On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction. In Proceedings of Interspeech. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. p. 1148-1152. ISSN: 1990-9772.Detail
MATĚJKA, P.; PLCHOT, O.; ZEINALI, H.; MOŠNER, L.; SILNOVA, A.; BURGET, L.; NOVOTNÝ, O.; GLEMBEK, O. Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. In Proceedings of Interspeech. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. p. 2448-2452. ISSN: 1990-9772.Detail
NOVOTNÝ, O.; PLCHOT, O.; GLEMBEK, O.; BURGET, L. Factorization of Discriminatively Trained i-Vector Extractor for Speaker Recognition. In Proceedings of Interspeech. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. p. 4330-4334. ISSN: 1990-9772.Detail
STAFYLAKIS, T.; ROHDIN, J.; PLCHOT, O.; MIZERA, P.; BURGET, L. Self-supervised speaker embeddings. In Proceedings of Interspeech. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. p. 2863-2867. ISSN: 1990-9772.Detail
DIEZ SÁNCHEZ, M.; BURGET, L.; LANDINI, F.; ČERNOCKÝ, J. Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 2020, vol. 28, no. 1, p. 355-368. ISSN: 2329-9290.Detail
ZEINALI, H.; ČERNOCKÝ, J.; BURGET, L. A multi purpose and large scale speech corpus in Persian and English for speaker and speech Recognition: the DeepMine database. In IEEE Automatic Speech Recognition and Understanding Workshop - Proceedings (ASRU). Sentosa, Singapore: IEEE Signal Processing Society, 2019. p. 397-402. ISBN: 978-1-7281-0306-8.Detail
ALAM, J.; BOULIANNE, G.; BURGET, L.; GLEMBEK, O.; LOZANO DÍEZ, A.; MATĚJKA, P.; MIZERA, P.; MOŠNER, L.; NOVOTNÝ, O.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; SLAVÍČEK, J.; STAFYLAKIS, T.; WANG, S.; ZEINALI, H.; DAHMANE, M.; ST-CHARLES, P.; LALONDE, M.; NOISEUX, C.; MONTEIRO, J. ABC System Description for NIST Multimedia Speaker Recognition Evaluation 2019. Proceedings of NIST 2019 SRE Workshop. Sentosa, Singapore: National Institute of Standards and Technology, 2019. p. 1-7.Detail
MATĚJKA, P.; PLCHOT, O.; GLEMBEK, O.; BURGET, L.; ROHDIN, J.; ZEINALI, H.; MOŠNER, L.; SILNOVA, A.; NOVOTNÝ, O.; DIEZ SÁNCHEZ, M.; ČERNOCKÝ, J. 13 years of speaker recognition research at BUT, with longitudinal analysis of NIST SRE. COMPUTER SPEECH AND LANGUAGE, 2020, vol. 2020, no. 63, p. 1-15. ISSN: 0885-2308.Detail
WANG, S.; ROHDIN, J.; PLCHOT, O.; BURGET, L.; YU, K.; ČERNOCKÝ, J. Investigation of Specaugment for Deep Speaker Embedding Learning. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Barcelona: IEEE Signal Processing Society, 2020. p. 7139-7143. ISBN: 978-1-5090-6631-5.Detail
SILNOVA, A.; BRUMMER, J.; ROHDIN, J.; STAFYLAKIS, T.; BURGET, L. Probabilistic embeddings for speaker diarization. Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop. Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland. Tokyo: International Speech Communication Association, 2020. p. 24-31. ISSN: 2312-2846.Detail
ALAM, J.; BOULIANNE, G.; BURGET, L.; DAHMANE, M.; DIEZ SÁNCHEZ, M.; GLEMBEK, O.; LALONDE, M.; LOZANO DÍEZ, A.; MATĚJKA, P.; MIZERA, P.; MOŠNER, L.; NOISEUX, C.; MONTEIRO, J.; NOVOTNÝ, O.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; SLAVÍČEK, J.; STAFYLAKIS, T.; ST-CHARLES, P.; WANG, S.; ZEINALI, H. Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge. In Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop. Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland. Tokyo: International Speech Communication Association, 2020. p. 289-295. ISSN: 2312-2846.Detail
LOZANO DÍEZ, A.; SILNOVA, A.; PULUGUNDLA, B.; ROHDIN, J.; VESELÝ, K.; BURGET, L.; PLCHOT, O.; GLEMBEK, O.; NOVOTNÝ, O.; MATĚJKA, P. BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Shanghai: International Speech Communication Association, 2020. p. 761-765. ISSN: 1990-9772.Detail
ZULUAGA-GOMEZ, J.; MOTLÍČEK, P.; ZHAN, Q.; VESELÝ, K.; BRAUN, R. Automatic Speech Recognition Benchmark for Air-Traffic Communications. In Proceedings of Interspeech 2020. Proceedings of Interspeech. Shanghai: International Speech Communication Association, 2020. p. 2297-2301. ISSN: 1990-9772.Detail
ZEINALI, H.; WANG, S.; SILNOVA, A.; MATĚJKA, P.; PLCHOT, O. BUT System Description to VoxCeleb Speaker Recognition Challenge 2019. Proceedings of The VoxCeleb Challange Workshop 2019. Graz: 2019. p. 1-4.Detail
LANDINI, F.; LOZANO DÍEZ, A.; BURGET, L.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; ŽMOLÍKOVÁ, K.; GLEMBEK, O.; MATĚJKA, P.; STAFYLAKIS, T.; BRUMMER, J. BUT System Description for The Third DIHARD Speech Diarization Challenge. Proceedings available at Dihard Challenge Github. on-line by LDC and University of Pennsylvania: 2021. p. 1-5.Detail
BURGET, L.; GLEMBEK, O.; LOZANO DÍEZ, A.; MATĚJKA, P.; NOVOTNÝ, O.; PLCHOT, O.; PULUGUNDLA, B.; ROHDIN, J.; SILNOVA, A.; VESELÝ, K. BUT System Description to SdSV Challenge 2020. Proceedings of Short-duration Speaker Verification Challenge 2020 Workshop. Shanghai, on-line event of Interspeech 2020 Conference: 2020. p. 1-5.Detail
KIŠŠ, M.; BENEŠ, K.; HRADIŠ, M. AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions. In Lladós J., Lopresti D., Uchida S. (eds) Document Analysis and Recognition - ICDAR 2021. Lecture Notes in Computer Science. Lausanne: Springer Nature Switzerland AG, 2021. p. 463-477. ISBN: 978-3-030-86336-4.Detail
LANDINI, F.; GLEMBEK, O.; MATĚJKA, P.; ROHDIN, J.; BURGET, L.; DIEZ SÁNCHEZ, M.; SILNOVA, A. Analysis of the BUT Diarization System for Voxconverse Challenge. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021. p. 5819-5823. ISBN: 978-1-7281-7605-5.Detail
VYDANA, H.; KARAFIÁT, M.; ŽMOLÍKOVÁ, K.; BURGET, L.; ČERNOCKÝ, J. Jointly Trained Transformers Models for Spoken Language Translation. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021. p. 7513-7517. ISBN: 978-1-7281-7605-5.Detail
YUSUF, B.; ONDEL YANG, L.; BURGET, L.; ČERNOCKÝ, J.; SARAÇLAR, M. A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021. p. 3710-3714. ISBN: 978-1-7281-7605-5.Detail
BASKAR, M.; BURGET, L.; WATANABE, S.; ASTUDILLO, R.; ČERNOCKÝ, J. Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021. p. 6753-6757. ISBN: 978-1-7281-7605-5.Detail
KARAFIÁT, M.; VESELÝ, K.; ČERNOCKÝ, J.; PROFANT, J.; NYTRA, J.; HLAVÁČEK, M.; PAVLÍČEK, T. Analysis of X-Vectors for Low-Resource Speech Recognition. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021. p. 6998-7002. ISBN: 978-1-7281-7605-5.Detail
ŽMOLÍKOVÁ, K.; DELCROIX, M.; BURGET, L.; NAKATANI, T.; ČERNOCKÝ, J. Integration of Variational Autoencoder and Spatial Clustering for Adaptive Multi-Channel Neural Speech Separation. In 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings. Shenzhen - virtual: IEEE Signal Processing Society, 2021. p. 889-896. ISBN: 978-1-7281-7066-4.Detail
KOCOUR, M.; CÁMBARA, G.; LUQUE, J.; BONET, D.; FARRÚS, M.; KARAFIÁT, M.; VESELÝ, K.; ČERNOCKÝ, J. BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge. Proceedings of IberSPEECH 2021. Vallaloid: International Speech Communication Association, 2021. p. 113-117.Detail
STAFYLAKIS, T.; ROHDIN, J.; BURGET, L. Speaker embeddings by modeling channel-wise correlations. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Brno: International Speech Communication Association, 2021. p. 501-505. ISSN: 1990-9772.Detail
PENG, J.; QU, X.; WANG, J.; GU, R.; XIAO, J.; BURGET, L.; ČERNOCKÝ, J. ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Brno: International Speech Communication Association, 2021. p. 511-515. ISSN: 1990-9772.Detail
BENEŠ, K.; BURGET, L. Text Augmentation for Language Models in High Error Recognition Scenario. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Brno: International Speech Communication Association, 2021. p. 1872-1876. ISSN: 1990-9772.Detail
PENG, J.; QU, X.; GU, R.; WANG, J.; XIAO, J.; BURGET, L.; ČERNOCKÝ, J. Effective Phase Encoding for End-To-End Speaker Verification. In Proceedings Interspeech 2021. Proceedings of Interspeech. Brno: International Speech Communication Association, 2021. p. 2366-2370. ISSN: 1990-9772.Detail
EGOROVA, E.; VYDANA, H.; BURGET, L.; ČERNOCKÝ, J. Out-of-Vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System. In Proceedings Interspeech 2021. Proceedings of Interspeech. Brno: International Speech Communication Association, 2021. p. 2901-2905. ISSN: 1990-9772.Detail
LANDINI, F.; PROFANT, J.; DIEZ SÁNCHEZ, M.; BURGET, L. Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks. COMPUTER SPEECH AND LANGUAGE, 2022, vol. 71, no. 101254, p. 1-16. ISSN: 0885-2308.Detail
BURGET, L.; BOJAR, O. NEUREM3 Interim Research Report. Brno: Department of Computer Graphics and Multimedia FIT BUT, 2022. p. 1-78.Detail
VYDANA, H.; KARAFIÁT, M.; BURGET, L.; ČERNOCKÝ, J. The IWSLT 2021 BUT Speech Translation Systems. In Proceedings of 18th International Conference on Spoken Language Translation (IWSLT). Bangkok, on-line: Association for Computational Linguistics, 2021. p. 75-83. ISBN: 978-1-7138-3378-9.Detail
KIŠŠ, M.; KOHÚT, J.; BENEŠ, K.; HRADIŠ, M. Importance of Textlines in Historical Document Classification. In Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. Lecture Notes in Computer Science. La Rochelle: Springer Nature Switzerland AG, 2022. p. 158-170. ISBN: 978-3-031-06554-5.Detail
MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Multisv: Dataset for Far-Field Multi-Channel Speaker Verification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022. p. 7977-7981. ISBN: 978-1-6654-0540-9.Detail
MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Multi-Channel Speaker Verification with Conv-Tasnet Based Beamformer. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022. p. 7982-7986. ISBN: 978-1-6654-0540-9.Detail
HAN, J.; LONG, Y.; BURGET, L.; ČERNOCKÝ, J. DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022. p. 7292-7296. ISBN: 978-1-6654-0540-9.Detail
ONDEL YANG, L.; YUSUF, B.; BURGET, L.; SARAÇLAR, M. Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 2022, vol. 30, no. 5, p. 1902-1917. ISSN: 2329-9290.Detail
EGOROVA, E.; VYDANA, H.; BURGET, L.; ČERNOCKÝ, J. Spelling-Aware Word-Based End-to-End ASR. IEEE SIGNAL PROCESSING LETTERS, 2022, vol. 29, no. 29, p. 1729-1733. ISSN: 1558-2361.Detail
SILNOVA, A.; STAFYLAKIS, T.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; MATĚJKA, P.; BURGET, L.; GLEMBEK, O.; BRUMMER, J. Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022. p. 9-16.Detail
PENG, J.; ZHANG, C.; ČERNOCKÝ, J.; YU, D. Progressive contrastive learning for self-supervised text-independent speaker verification. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022. p. 17-24.Detail
BRUMMER, J.; SWART, A.; MOŠNER, L.; SILNOVA, A.; PLCHOT, O.; STAFYLAKIS, T.; BURGET, L. Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022. p. 1446-1450. ISSN: 1990-9772.Detail
ALAM, J.; BURGET, L.; GLEMBEK, O.; MATĚJKA, P.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; STAFYLAKIS, T. Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022. p. 346-353.Detail
LANDINI, F.; LOZANO DÍEZ, A.; DIEZ SÁNCHEZ, M.; BURGET, L. From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022. p. 5095-5099. ISSN: 1990-9772.Detail
STAFYLAKIS, T.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; BURGET, L.; ČERNOCKÝ, J. Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022. p. 605-609. ISSN: 1990-9772.Detail
PENG, J.; GU, R.; MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Learnable Sparse Filterbank for Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022. p. 5110-5114. ISSN: 1990-9772.Detail
BASKAR, M.; HERZIG, T.; NGUYEN, D.; DIEZ SÁNCHEZ, M.; POLZEHL, T.; BURGET, L.; ČERNOCKÝ, J. Speaker adaptation for Wav2vec2 based dysarthric ASR. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022. p. 3403-3407. ISSN: 1990-9772.Detail
KOCOUR, M.; UMESH, J.; KARAFIÁT, M.; ŠVEC, J.; LOPEZ, F.; BENEŠ, K.; DIEZ SÁNCHEZ, M.; SZŐKE, I.; LUQUE, J.; VESELÝ, K.; BURGET, L.; ČERNOCKÝ, J. BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge. Proceedings of IberSpeech 2022. Granada: International Speech Communication Association, 2022. p. 276-280.Detail
NADIMPALLI, V.; KESIRAJU, S.; BANKA, R.; KETHIREDDY, R.; GANGASHETTY, S. Resources and Benchmarks for Keyword Search in Spoken Audio From Low-Resource Indian Languages. IEEE Access, 2022, vol. 10, no. 2022, p. 34789-34799. ISSN: 2169-3536.Detail
PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023. p. 555-562. ISBN: 978-1-6654-7189-3.Detail
STAFYLAKIS, T.; MOŠNER, L.; KAKOUROS, S.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023. p. 1136-1143. ISBN: 978-1-6654-7189-3.Detail
SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; KLČO, M.; PLCHOT, O.; MATĚJKA, P.; PENG, J.; STAFYLAKIS, T.; BURGET, L. ABC System Description for NIST LRE 2022. Proceedings of NIST LRE 2022 Workshop. Washington DC: National Institute of Standards and Technology, 2023. p. 1-5.Detail
LANDINI, F.; DIEZ SÁNCHEZ, M.; LOZANO DÍEZ, A.; BURGET, L. Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.Detail
KESIRAJU, S.; BENEŠ, K.; TIKHONOV, M.; ČERNOCKÝ, J. BUT Systems for IWSLT 2023 Marathi - Hindi Low Resource Speech Translation Task. In 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference. Toronto (in-person and online): Association for Computational Linguistics, 2023. p. 227-234. ISBN: 978-1-959429-84-5.Detail
SILNOVA, A.; BRUMMER, J.; SWART, A.; BURGET, L. Toroidal Probabilistic Spherical Discriminant Analysis. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.Detail
PENG, J.; STAFYLAKIS, T.; GU, R.; PLCHOT, O.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.Detail
KAKOUROS, S.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L. Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.Detail
YUSUF, B.; ČERNOCKÝ, J.; SARAÇLAR, M. End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 2023, vol. 31, no. 08, p. 3070-3080. ISSN: 2329-9290.Detail
YU, D.; GONG, Y.; PICHENY, A.; RAMABHADRAN, B.; HAKKANI-TÜR, D.; PRASAD, R.; ZEN, H.; SKOGLUND, J.; ČERNOCKÝ, J.; BURGET, L.; MOHAMED, A. Twenty-Five Years of Evolution in Speech and Language Processing. IEEE SIGNAL PROCESSING MAGAZINE, 2023, vol. 40, no. 5, p. 27-39. ISSN: 1558-0792.Detail
MOŠNER, L.; PLCHOT, O.; PENG, J.; BURGET, L.; ČERNOCKÝ, J. Multi-Channel Speech Separation with Cross-Attention and Beamforming. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 1693-1697. ISSN: 1990-9772.Detail
KESIRAJU, S.; SARVAŠ, M.; PAVLÍČEK, T.; MACAIRE, C.; CIUBA, A. Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 2148-2152. ISSN: 1990-9772.Detail
DELCROIX, M.; TAWARA, N.; DIEZ SÁNCHEZ, M.; LANDINI, F.; SILNOVA, A.; OGAWA, A.; NAKATANI, T.; BURGET, L.; ARAKI, S. Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 3477-3481. ISSN: 1990-9772.Detail
MATĚJKA, P.; SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; PLCHOT, O.; KLČO, M.; PENG, J.; STAFYLAKIS, T.; BURGET, L. Description and Analysis of ABC Submission to NIST LRE 2022. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 511-515. ISSN: 1990-9772.Detail
PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Improving Speaker Verification with Self-Pretrained Transformer Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. p. 5361-5365. ISSN: 1990-9772.Detail
BENEŠ, K.; KOCOUR, M.; BURGET, L. Hystoc: Obtaining Word Confidences for Fusion of End-To-End ASR Systems. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 11276-11280. ISBN: 979-8-3503-4485-1.Detail
HAN, J.; LANDINI, F.; ROHDIN, J.; DIEZ SÁNCHEZ, M.; BURGET, L.; CAO, Y.; LU, H.; ČERNOCKÝ, J. Diacorrect: Error Correction Back-End for Speaker Diarization. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul: IEEE Signal Processing Society, 2024. p. 11181-11185. ISBN: 979-8-3503-4485-1.Detail
PENG, J.; DELCROIX, M.; OCHIAI, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. Target Speech Extraction with Pre-Trained Self-Supervised Learning Models. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 10421-10425. ISBN: 979-8-3503-4485-1.Detail
PENG, J.; DELCROIX, M.; OCHIAI, T.; ASHIHARA, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. Probing Self-Supervised Learning Models With Target Speech Extraction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 535-539. ISBN: 979-8-3503-7451-3.Detail
KLEMENT, D.; DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.; SILNOVA, A.; DELCROIX, M.; TAWARA, N. Discriminative Training of VBx Diarization. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 11871-11875. ISBN: 979-8-3503-4485-1.Detail
LANDINI, F.; DIEZ SÁNCHEZ, M.; STAFYLAKIS, T.; BURGET, L. DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. IEEE Transactions on Audio, Speech, and Language Processing, 2024, vol. 32, no. 7, p. 3450-3465. ISSN: 1558-7916.Detail
DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.: x-vectors Diarization (aka VBx); Bayesian HMM based x-vector clustering - VBx. https://github.com/BUTSpeechFIT/VBx. URL: https://github.com/BUTSpeechFIT/VBx. (software)Detail