Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail projektu
Období řešení: 01.01.2011 — 31.12.2014
Zdroje financování
Technologická agentura ČR - Program aplikovaného výzkumu a experimentálního vývoje ALFA
- plně financující (2011-01-01 - 2014-12-31)
O projektu
Cílem projektu je vyvinout pokročilé techniky pro rozpoznávání řeči a nasadit je v praktických aplikacích: vyhledávání v elektronickém slovníku na mobilních zařízeních, diktování překladů, v bezpečnosti a obraně, v dialogových systémech, systémech péče o zákazníky (CRM, helpdesk apod.) a v audiovizuálním přístupu k výukovým materiálům.
Popis anglickyProject aims at development of advanced techniques in speech recognition and their deployment in the functional applications: search in electronic dictionaries on mobile devices, dictating translations, in defense and security, in dialogue systems, in client-care systems (CRM, helpdesk etc.) and in audio-visual access to teaching materials.
Klíčová slovarozpoznávání řeči, elektronické slovníky, obrana a bezpečnost, mobilní zařízení, dialogové systémy, CRM, eLearning
Klíčová slova anglickyspeech recognition, electronic dictionaries, defense and security, mobile devices, dialogue systems, CRM, eLearning
Označení
TA01011328
Originální jazyk
čeština
Řešitelé
Černocký Jan, prof. Dr. Ing. - hlavní řešitelKarafiát Martin, Ing., Ph.D. - spoluřešitelOndel Lucas Antoine Francois, Mgr., Ph.D. - spoluřešitel
Útvary
Ústav počítačové grafiky a multimédií- příjemce (01.02.2011 - 31.12.2014)
Výsledky
DEORAS, A.; MIKOLOV, T.; CHURCH, K. A Fast Re-scoring Strategy to Capture Long-Distance Dependencies. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing July 2011 Edinburgh, Scotland, UK. Edinburgh: Association for Computational Linguistics, 2011. p. 1116-1127. ISBN: 978-1-937284-11-4.Detail
POVEY, D.; GHOSHAL, A.; BOULIANNE, G.; BURGET, L.; GLEMBEK, O.; GOEL, N.; HANNEMANN, M.; MOTLÍČEK, P.; QIAN, Y.; SCHWARZ, P.; SILOVSKÝ, J.; STEMMER, G.; VESELÝ, K. The Kaldi Speech Recognition Toolkit. Proceedings of ASRU 2011. Hilton Waikoloa Village Resort, Hawaii: IEEE Signal Processing Society, 2011. p. 1-4. ISBN: 978-1-4673-0366-8.Detail
MIKOLOV, T.; DEORAS, A.; KOMBRINK, S.; BURGET, L.; ČERNOCKÝ, J. Empirical Evaluation and Combination of Advanced Language Modeling Techniques. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 605-608. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.Detail
KOMBRINK, S.; MIKOLOV, T.; KARAFIÁT, M.; BURGET, L. Recurrent Neural Network based Language Modeling in Meeting Recognition. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 2877-2880. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.Detail
KARAFIÁT, M.; BURGET, L.; MATĚJKA, P.; GLEMBEK, O.; ČERNOCKÝ, J. iVector-Based Discriminative Adaptation for Automatic Speech Recognition. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 152-157. ISBN: 978-1-4673-0366-8.Detail
VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F. Convolutive Bottleneck Network Features for LVCSR. Proceedings of ASRU 2011. Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 42-47. ISBN: 978-1-4673-0366-8.Detail
GRÉZL, F. The Role of Neural Network Size in TRAP/HATS Feature Extraction. Proceedings Text, Speech and Dialogue 2011. Lecture Notes in Computer Science. LNAI 6836. Plzeň: Springer Verlag, 2011. p. 315-322. ISBN: 978-3-642-23537-5. ISSN: 0302-9743.Detail
MIKOLOV, T.; DEORAS, A.; POVEY, D.; BURGET, L.; ČERNOCKÝ, J. Strategies for Training Large Scale Neural Network Language Models. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 196-201. ISBN: 978-1-4673-0366-8.Detail
SOUFIFAR, M.; CUMANI, S.; BURGET, L.; ČERNOCKÝ, J. Discriminative Classifiers for Phonotactic Language Recognition with iVectors. Proc. International Conference on Acoustics, Speech, and Signal Processing 2012. Kyoto: IEEE Signal Processing Society, 2012. p. 4853-4856. ISBN: 978-1-4673-0044-5.Detail
POVEY, D.; HANNEMANN, M.; BOULIANNE, G.; BURGET, L.; GHOSHAL, A.; JANDA, M.; KARAFIÁT, M.; KOMBRINK, S.; MOTLÍČEK, P.; QIAN, Y.; RIEDHAMMER, K.; VESELÝ, K.; VU, N. Generating Exact Lattices in The WFST Framework. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012. p. 4213-4216. ISBN: 978-1-4673-0044-5.Detail
KOMBRINK, S.; MIKOLOV, T.; KARAFIÁT, M.; BURGET, L. Improving Language Models for ASR Using Translated In-domain Data. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012. p. 4405-4408. ISBN: 978-1-4673-0044-5.Detail
KARAFIÁT, M.; JANDA, M.; ČERNOCKÝ, J.; BURGET, L. Region Dependent Linear Transforms in Multilingual Speech Recognition. In Proc. International Conference on Acoustics, Speech, and Signal Processing 2012. Kyoto: IEEE Signal Processing Society, 2012. p. 4885-4888. ISBN: 978-1-4673-0044-5.Detail
CUMANI, S.; PLCHOT, O.; KARAFIÁT, M. Independent Component Analysis and MLLR Transforms for Speaker Identification. Proc. International Conference on Acoustics, Speech, and Signal P. Kyoto: IEEE Signal Processing Society, 2012. p. 4365-4368. ISBN: 978-1-4673-0044-5.Detail
MIKOLOV, T.; KOMBRINK, S.; DEORAS, A.; BURGET, L.; ČERNOCKÝ, J. RNNLM - Recurrent Neural Network Language Modeling Toolkit. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 1-4. ISBN: 978-1-4673-0366-8.Detail
RATH, S.; KARAFIÁT, M.; GLEMBEK, O.; ČERNOCKÝ, J. A factorized representation of FMLLR transform based on QR-decomposition. Proceedings of Interspeech 2012. Proceedings of Interspeech. Portland, Oregon: International Speech Communication Association, 2012. p. 1-4. ISBN: 978-1-62276-759-5. ISSN: 1990-9772.Detail
VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F.; JANDA, M.; EGOROVA, E. The Language-Independent Bottleneck Features. Proceedings of IEEE 2012 Workshop on Spoken Language Technology. Miami: IEEE Signal Processing Society, 2012. p. 336-341. ISBN: 978-1-4673-5124-9.Detail
DEORAS, A.; MIKOLOV, T.; KOMBRINK, S.; CHURCH, K. Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model. Speech Communication, 2012, vol. 2012, no. 8, p. 1-16. ISSN: 0167-6393.Detail
SZŐKE, I.; FAPŠO, M.; ŽIŽKA, J.; BERAN, V.; ČERNOCKÝ, J. Efektivní přístup ke znalostem v audio-vizuálních záznamech. Proceedings of the Annual Database Conference. Praha: Technická univerzita v Košiciach, 2012. s. 57-74. ISBN: 978-80-553-1049-7.Detail
SZŐKE, I.; FAPŠO, M.; VESELÝ, K. BUT2012 přístup pro Spoken Web Search úkol na MediaEval2012. Working Notes Proceedings of the MediaEval 2012 Workshop. CEUR Workshop Proceedings. Pisa: CEUR-WS.org, 2012. s. 1-2. ISSN: 1613-0073.Detail
GRÉZL, F.; KARAFIÁT, M. Integrating recent MLP feature extraction techniques into TRAP architecture. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 1229-1232. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.Detail
ONDEL YANG, L.; ANGUERA, X.; LUQUE, J. MASK+:Data-Driven Regions Selection for Acoustic Fingerprinting. In Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. South Brisbane, Queensland: IEEE Signal Processing Society, 2015. p. 335-339. ISBN: 978-1-4673-6997-8.Detail
PLCHOT, O.; MATSOUKAS, S.; MATĚJKA, P.; DEHAK, N.; MA, J.; CUMANI, S.; GLEMBEK, O.; HEŘMANSKÝ, H.; MESGARANI, N.; SOUFIFAR, M.; THOMAS, S.; ZHANG, B.; ZHOU, X. Developing A Speaker Identification System For The DARPA RATS Project. Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 6768-6772. ISBN: 978-1-4799-0355-9.Detail
EGOROVA, E.; VESELÝ, K.; KARAFIÁT, M.; JANDA, M.; ČERNOCKÝ, J. Manual and Semi-Automatic Approaches to Building a Multilingual Phoneme Set. In Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 7324-7328. ISBN: 978-1-4799-0355-9.Detail
LEI, Y.; BURGET, L.; SCHEFFER, N. A Noise Robust I-Vector Extractor Using Vector Taylor Series For Speaker Recognition. Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 6788-6791. ISBN: 978-1-4799-0355-9.Detail
RATH, S.; BURGET, L.; KARAFIÁT, M.; GLEMBEK, O.; ČERNOCKÝ, J. A Region-specific Feature-space Transformation for Speaker Adaptation and Singularity Analysis of Jacobian Matrix. Proceedings of Interspeeech 2013. Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon: International Speech Communication Association, 2013. p. 1228-1232. ISBN: 978-1-62993-443-3. ISSN: 2308-457X.Detail
RATH, S.; POVEY, D.; VESELÝ, K.; ČERNOCKÝ, J. Improved Feature Processing for Deep Neural Networks. Proceedings of Interspeech 2013. Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon: International Speech Communication Association, 2013. p. 109-113. ISBN: 978-1-62993-443-3. ISSN: 2308-457X.Detail
KARAFIÁT, M.; VESELÝ, K.; SZŐKE, I.; BURGET, L.; GRÉZL, F.; HANNEMANN, M.; ČERNOCKÝ, J. BUT ASR System for BABEL Surprise Evaluation 2014. In Proceedings of 2014 Spoken Language Technology Workshop. South Lake Tahoe, Nevada: IEEE Signal Processing Society, 2014. p. 501-506. ISBN: 978-1-4799-7129-9.Detail
KARAFIÁT, M.; GRÉZL, F.; HANNEMANN, M.; ČERNOCKÝ, J. BUT Neural Network Features for Spontaneous Vietnamese in BABEL. In Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014. p. 5659-5663. ISBN: 978-1-4799-2892-7.Detail
GLEMBEK, O.; MA, J.; MATĚJKA, P.; ZHANG, B.; PLCHOT, O.; BURGET, L.; MATSOUKAS, S. Domain Adaptation Via Within-class Covariance Correction in I-Vector Based Speaker Recognition Systerms. In Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014. p. 4060-4064. ISBN: 978-1-4799-2892-7.Detail
MARTÍNEZ GONZÁLEZ, D.; BURGET, L.; STAFYLAKIS, T.; LEI, Y.; KENNY, P.; LLEIDA, E. Unscented Transform For Ivector-based Noisy Speaker Recognition. In Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014. p. 4070-4074. ISBN: 978-1-4799-2892-7.Detail
KARAFIÁT, M.; GRÉZL, F.; VESELÝ, K.; HANNEMANN, M.; SZŐKE, I.; ČERNOCKÝ, J. BUT 2014 Babel System: Analysis of adaptation in NN based systems. In Proceedings of Interspeech 2014. Singapore: International Speech Communication Association, 2014. p. 3002-3006. ISBN: 978-1-63439-435-2.Detail
ŽIŽKA, J.; SZŐKE, I.; FAPŠO, M.: ProhlizecPrednasek; Audiovizuální prohlížeč přednášek. Fakulta informačních technologií, Vysoké učení technické v Brně. URL: https://www.fit.vut.cz/research/product/431/. (prototyp)Detail
POVEY, D.; GHOSHAL, A.; BOULIANNE, G.; BURGET, L.; GLEMBEK, O.; GOEL, N.; HANNEMANN, M.; MOTLÍČEK, P.; QIAN, Y.; SCHWARZ, P.; SILOVSKÝ, J.; STEMMER, G.; VESELÝ, K.: KALDI; KALDI speech recognition toolkit. http://kaldi.sourceforge.net/. URL: http://kaldi.sourceforge.net/. (software)Detail
Odkaz
http://www.tacr.cz/programy-ta-cr/program-alfa