Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail projektu
Období řešení: 01.09.2009 — 31.08.2013
Zdroje financování
Ministerstvo průmyslu a obchodu ČR - TIP
- částečně financující (2009-09-01 - 2013-08-30)
O projektu
Projekt je zaměřen na výzkum, vývoj a ověření technologií, které umožní prototypovat systémy pro rozpoznávání a vyhledávání v řeči pouze s několika hodinami přepsaných trénovacích dat, bez fonetické a lingvistické expertízy. Tyto technologie budou ověřeny v oblasti elektronických slovníků.
Popis anglickyThe proposed project aims at research, development and assessment of technologies for prototyping of speech recognition and search systems with only a few hours of transcribed training data, without the need for phonetic or linguistic expertise. These technologies will be tested in the domain of electronic dictionaries.
Klíčová slovamultiligvalita, rozpoznávání řeči, detekceklíčových slov, elektronické slovníky
Klíčová slova anglickymultilinguality, speech recognition, keyword spotting, electronic dictionaries
Označení
FR-TI1/034
Originální jazyk
čeština
Řešitelé
Černocký Jan, prof. Dr. Ing. - hlavní řešitelBurget Lukáš, doc. Ing., Ph.D. - spoluřešitelGrézl František, Ing., Ph.D. - spoluřešitelKarafiát Martin, Ing., Ph.D. - spoluřešitelMatějka Pavel, Ing., Ph.D. - spoluřešitelSchwarz Petr, Ing., Ph.D. - spoluřešitelŽižka Josef, Ing. - spoluřešitel
Útvary
Ústav počítačové grafiky a multimédií- spolupříjemce (01.09.2009 - 30.08.2013)Lingea s.r.o.- příjemce (01.09.2009 - 30.08.2013)
Výsledky
POVEY, D.; BURGET, L.; AGARWAL, M.; AKYAZI, P.; FENG, K.; GHOSHAL, A.; GLEMBEK, O.; GOEL, N.; KARAFIÁT, M.; RASTROW, A.; ROSE, R.; SCHWARZ, P.; THOMAS, S. Subspace Gaussian mixture models for speech recognition. Proc. International Conference on Acoustics, Speech, and Signal Processing. Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010. p. 4330-4333. ISBN: 978-1-4244-4296-6. ISSN: 1520-6149.Detail
POVEY, D.; GHOSHAL, A.; BOULIANNE, G.; BURGET, L.; GLEMBEK, O.; GOEL, N.; HANNEMANN, M.; MOTLÍČEK, P.; QIAN, Y.; SCHWARZ, P.; SILOVSKÝ, J.; STEMMER, G.; VESELÝ, K. The Kaldi Speech Recognition Toolkit. Proceedings of ASRU 2011. Hilton Waikoloa Village Resort, Hawaii: IEEE Signal Processing Society, 2011. p. 1-4. ISBN: 978-1-4673-0366-8.Detail
GHOSHAL, A.; POVEY, D.; AGARWAL, M.; AKYAZI, P.; BURGET, L.; FENG, K.; GLEMBEK, O.; GOEL, N.; KARAFIÁT, M.; RASTROW, A.; ROSE, R.; SCHWARZ, P.; THOMAS, S. A novel estimation of feature-space MLLR for full_covariance models. Proc. International Conference on Acoustics, Speech, and Signal Processing. Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010. p. 4310-4313. ISBN: 978-1-4244-4296-6. ISSN: 1520-6149.Detail
GOEL, N.; THOMAS, S.; AGARWAL, M.; AKYAZI, P.; BURGET, L.; FENG, K.; GHOSHAL, A.; GLEMBEK, O.; KARAFIÁT, M.; POVEY, D.; RASTROW, A.; ROSE, R.; SCHWARZ, P. Approaches to automatic lexicon learning with limited training examples. Proc. International Conference on Acoustics, Speech, and Signal Processing. Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010. p. 5094-5097. ISBN: 978-1-4244-4296-6. ISSN: 1520-6149.Detail
POVEY, D.; KARAFIÁT, M.; GHOSHAL, A.; SCHWARZ, P. A Symmetrization of the Subspace Gaussian Mixture Model. Proceedings of 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing. Praha: IEEE Signal Processing Society, 2011. p. 4504-4507. ISBN: 978-1-4577-0537-3.Detail
POVEY, D.; BURGET, L.; AGARWAL, M.; AKYAZI, P.; GHOSHAL, A.; GLEMBEK, O.; GOEL, N.; KARAFIÁT, M.; RASTROW, A.; ROSE, R.; SCHWARZ, P.; THOMAS, S. The subspace Gaussian mixture model-A structured model for speech recognition. COMPUTER SPEECH AND LANGUAGE, 2011, vol. 25, no. 2, p. 404-439. ISSN: 0885-2308.Detail
KARAFIÁT, M.; BURGET, L.; MATĚJKA, P.; GLEMBEK, O.; ČERNOCKÝ, J. iVector-Based Discriminative Adaptation for Automatic Speech Recognition. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 152-157. ISBN: 978-1-4673-0366-8.Detail
VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F. Convolutive Bottleneck Network Features for LVCSR. Proceedings of ASRU 2011. Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 42-47. ISBN: 978-1-4673-0366-8.Detail
MIKOLOV, T.; DEORAS, A.; POVEY, D.; BURGET, L.; ČERNOCKÝ, J. Strategies for Training Large Scale Neural Network Language Models. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 196-201. ISBN: 978-1-4673-0366-8.Detail
GRÉZL, F.; KARAFIÁT, M.; JANDA, M. Study of Probabilistic and Bottle-Neck Features in Multilingual Environment. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 359-364. ISBN: 978-1-4673-0366-8.Detail
KOMBRINK, S.; MIKOLOV, T.; KARAFIÁT, M.; BURGET, L. Improving Language Models for ASR Using Translated In-domain Data. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012. p. 4405-4408. ISBN: 978-1-4673-0044-5.Detail
KARAFIÁT, M.; JANDA, M.; ČERNOCKÝ, J.; BURGET, L. Region Dependent Linear Transforms in Multilingual Speech Recognition. In Proc. International Conference on Acoustics, Speech, and Signal Processing 2012. Kyoto: IEEE Signal Processing Society, 2012. p. 4885-4888. ISBN: 978-1-4673-0044-5.Detail
JANDA, M. Grapheme Based Speech Recognition. Proceedings of the 18th Conference STUDENT EEICT 2012. Brno: Brno University of Technology, 2012. p. 441-445. ISBN: 978-80-214-4460-7.Detail
BRUMMER, J.; CUMANI, S.; GLEMBEK, O.; KARAFIÁT, M.; MATĚJKA, P.; PEŠÁN, J.; PLCHOT, O.; SOUFIFAR, M.; DE VILLIERS, E.; ČERNOCKÝ, J. Description and analysis of the Brno276 system for LRE2011. In Proceedings of Odyssey 2012: The Speaker and Language Recognition Workshop. Singapur: International Speech Communication Association, 2012. p. 216-223. ISBN: 978-981-07-3093-2.Detail
PLCHOT, O.; KARAFIÁT, M.; BRUMMER, J.; GLEMBEK, O.; MATĚJKA, P.; DE VILLIERS, E.; ČERNOCKÝ, J. Speaker vectors from Subspace Gaussian Mixture Model as complementary features for Language Identification. In Proceedings of Odyssey 2012, The Speaker and Language Recognition Workshop. Singapur: International Speech Communication Association, 2012. p. 330-333. ISBN: 978-981-07-3093-2.Detail
MIKOLOV, T.; KOMBRINK, S.; DEORAS, A.; BURGET, L.; ČERNOCKÝ, J. RNNLM - Recurrent Neural Network Language Modeling Toolkit. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 1-4. ISBN: 978-1-4673-0366-8.Detail
VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F.; JANDA, M.; EGOROVA, E. The Language-Independent Bottleneck Features. Proceedings of IEEE 2012 Workshop on Spoken Language Technology. Miami: IEEE Signal Processing Society, 2012. p. 336-341. ISBN: 978-1-4673-5124-9.Detail
JANDA, M.; KARAFIÁT, M.; ČERNOCKÝ, J. Dealing with Numbers in Grapheme-Based Speech Recognition. Proceedings of 15th International Conference on Text, Speech and Dialogue. Lecture Notes in Computer Science. Lecture Notes in Computer Science, 2012, Volume 7499. Springer-Verlag Berlin Heidelberg 2012: Springer Verlag, 2012. p. 438-445. ISBN: 978-3-642-32789-6. ISSN: 0302-9743.Detail
SZŐKE, I.; FAPŠO, M.; VESELÝ, K. BUT2012 přístup pro Spoken Web Search úkol na MediaEval2012. Working Notes Proceedings of the MediaEval 2012 Workshop. CEUR Workshop Proceedings. Pisa: CEUR-WS.org, 2012. s. 1-2. ISSN: 1613-0073.Detail
TEJEDOR, J.; FAPŠO, M.; SZŐKE, I.; ČERNOCKÝ, J.; GRÉZL, F. Comparison of methods for language-dependent and language-independent query-by-example spoken term detection. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, vol. 2012, no. 30, p. 1-34. ISSN: 1046-8188.Detail
JANDA, M. Automatic Generation Of Pronunciation Dictionaries Based On Diarization. Proceedings of the 19th Conference Student EEICT 2013. Brno: Brno University of Technology, 2013. p. 228-232. ISBN: 978-80-214-4695-3.Detail
EGOROVA, E.; VESELÝ, K.; KARAFIÁT, M.; JANDA, M.; ČERNOCKÝ, J. Manual and Semi-Automatic Approaches to Building a Multilingual Phoneme Set. In Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 7324-7328. ISBN: 978-1-4799-0355-9.Detail
SOUFIFAR, M.; BURGET, L.; PLCHOT, O.; CUMANI, S.; ČERNOCKÝ, J. Regularized Subspace n-Gram Model for Phonotactic iVector Extraction. Proceedings of Interspeech 2013. Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon: International Speech Communication Association, 2013. p. 74-78. ISBN: 978-1-62993-443-3. ISSN: 2308-457X.Detail
BURGET, L.; SCHWARZ, P.; AGARWAL, M.; AKYAZI, P.; FENG, K.; GHOSHAL, A.; GLEMBEK, O.; GOEL, N.; KARAFIÁT, M.; POVEY, D.; RASTROW, A.; ROSE, R.; THOMAS, S. Multilingual acoustic modeling for speech recognition based on Subspace Gaussian Mixture Models. Proc. International Conference on Acoustictics, Speech, and Signal Processing. Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010. p. 4334-4337. ISBN: 978-1-4244-4296-6. ISSN: 1520-6149.Detail
KARAFIÁT, M.; GRÉZL, F.; EGOROVA, E.; JANDA, M.; ČERNOCKÝ, J.; KAŠPAR, M.: ZB - FR-TI1/034; Prototypování rozpoznávačů řeči pro nové jazyky. http://www.fit.vutbr.cz/research/groups/speech/publi/2013/Overena_technologie_2013_Projekt_FR_TI1_034.pdf. URL: http://www.fit.vutbr.cz/research/groups/speech/publi/2013/Overena_technologie_2013_Projekt_FR_TI1_034.pdf. (ověřená technologie)Detail
KARAFIÁT, M.; GRÉZL, F.; EGOROVA, E.; JANDA, M.; ČERNOCKÝ, J.: R - MPO TIP FR-TI1/034; Multilingvální modely pro rozpoznávání řeči. Produkt je umístěn na serverech ÚPGM FIT VUT v Brně.. URL: https://www.fit.vut.cz/research/product/375/. (software)Detail