Project detail

Speech recognition for low-resource languages

Duration: 01.01.2012 — 31.12.2014

Funding resources

Czech Science Foundation - Postdoktorandské granty

- whole funder (2012-01-01 - 2014-12-31)

On the project

Projekt je zaměřen na rozpoznávání řeči v situacích, kdy je k dispozici málo trénovacích dat a omezené nebo žádné znalosti z lingvistiky a fonetiky cílového jazyka. V oblasti akustických modelů bude zkoumat moderní techniky representace parametrů GMM/HMM modelů v podprostorech. Zaměří se také na automatickou tvorbu výslovnostních slovníků náhradou sekvencí fonému za sekvence znaků nebo shluku akustických jednotek natrénovaných na datech. Nedílnou součástí projektu je ověřování na standardních datech a účast v mezinárodních konferencích.

Description in English
The project aims at speech recognition in situations, where little training data and limited or no knowledge of liguistic and phonetics of the target language are available. In the domain of acoustic models, we will investigate modern techniques of GMM/HMM parameter representations in sub-spaces. The project will also focus on unsupervised creation of pronunciation dictionaries where sequences of phonemes will be replaced by graphemes or clusters of acoustic unites trained on data. Important parts of the project are tests on standard data and participation in international evaluations.

Keywords
rozpoznávání řeči, multilingvální rozpoznávaní řeči, automatické generování výslovnostního slovníku, SGMM

Key words in English
speech recognition, multilingual speech recognition, automatic dictionary generation, SGMM

Mark

GPP202/12/P604

Default language

Czech

People responsible

Karafiát Martin, Ing., Ph.D. - principal person responsible

Units

Department of Computer Graphics and Multimedia
- beneficiary (2011-04-21 - 2014-12-31)

Results

KOMBRINK, S.; MIKOLOV, T.; KARAFIÁT, M.; BURGET, L. Improving Language Models for ASR Using Translated In-domain Data. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012. p. 4405-4408. ISBN: 978-1-4673-0044-5.
Detail

KARAFIÁT, M.; GRÉZL, F.; VESELÝ, K.; HANNEMANN, M.; SZŐKE, I.; ČERNOCKÝ, J. BUT 2014 Babel System: Analysis of adaptation in NN based systems. In Proceedings of Interspeech 2014. Singapore: International Speech Communication Association, 2014. p. 3002-3006. ISBN: 978-1-63439-435-2.
Detail

VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F.; JANDA, M.; EGOROVA, E. The Language-Independent Bottleneck Features. Proceedings of IEEE 2012 Workshop on Spoken Language Technology. Miami: IEEE Signal Processing Society, 2012. p. 336-341. ISBN: 978-1-4673-5124-9.
Detail

EGOROVA, E.; VESELÝ, K.; KARAFIÁT, M.; JANDA, M.; ČERNOCKÝ, J. Manual and Semi-Automatic Approaches to Building a Multilingual Phoneme Set. In Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 7324-7328. ISBN: 978-1-4799-0355-9.
Detail

MOTLÍČEK, P.; POVEY, D.; KARAFIÁT, M. Feature And Score Level Combination Of Subspace Gaussians In LVCSR Task. Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 7604-7608. ISBN: 978-1-4799-0355-9.
Detail

KARAFIÁT, M.; GRÉZL, F.; HANNEMANN, M.; VESELÝ, K.; ČERNOCKÝ, J. BUT BABEL System for Spontaneous Cantonese. Proceedings of Interspeech 2013. Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon: International Speech Communication Association, 2013. p. 2589-2593. ISBN: 978-1-62993-443-3. ISSN: 2308-457X.
Detail

GRÉZL, F.; KARAFIÁT, M. Semi-Supervised Bootstrapping Approach For Neural Network Feature Extractor Training. Proceedings of ASRU 2013. Olomouc: IEEE Signal Processing Society, 2013. p. 470-475. ISBN: 978-1-4799-2755-5.
Detail

GRÉZL, F.; EGOROVA, E.; KARAFIÁT, M. Further Investigation into Multilingual Training and Adaptation of Stacked Bottle-neck Neural Network Structure. In Proceedings of 2014 Spoken Language Technology Workshop. South Lake Tahoe, Nevada: IEEE Signal Processing Society, 2014. p. 48-53. ISBN: 978-1-4799-7129-9.
Detail

KARAFIÁT, M.; VESELÝ, K.; SZŐKE, I.; BURGET, L.; GRÉZL, F.; HANNEMANN, M.; ČERNOCKÝ, J. BUT ASR System for BABEL Surprise Evaluation 2014. In Proceedings of 2014 Spoken Language Technology Workshop. South Lake Tahoe, Nevada: IEEE Signal Processing Society, 2014. p. 501-506. ISBN: 978-1-4799-7129-9.
Detail

KARAFIÁT, M.; GRÉZL, F.; HANNEMANN, M.; ČERNOCKÝ, J. BUT Neural Network Features for Spontaneous Vietnamese in BABEL. In Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014. p. 5659-5663. ISBN: 978-1-4799-2892-7.
Detail

GRÉZL, F.; KARAFIÁT, M.; VESELÝ, K. Adaptation of Multilingual Stacked Bottle-neck Neural Network Structure for New Language. In Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014. p. 7704-7708. ISBN: 978-1-4799-2892-7.
Detail

GRÉZL, F.; KARAFIÁT, M. Adapting Multilingual Neural Network Hierarchy to a New Language. Proceedings of the 4th International Workshop on Spoken Language Technologies for Under- resourced Languages SLTU-2014. St. Petersburg, Russia, 2014. St. Petersburg: International Speech Communication Association, 2014. p. 39-45. ISBN: 978-5-8088-0908-6.
Detail

GRÉZL, F.; KARAFIÁT, M. Combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages. In Proceedings of Interspeech 2014. Singapore: International Speech Communication Association, 2014. p. 820-824. ISBN: 978-1-63439-435-2.
Detail

NG, T.; HSIAO, R.; ZHANG, L.; KARAKOS, D.; MALLIDI, S.; KARAFIÁT, M.; VESELÝ, K.; SZŐKE, I.; ZHANG, B.; NGUYEN, L.; SCHWARTZ, R. Progress in the BBN Keyword Search System for the DARPA RATS Program. In Proceedings of Interspeech 2014. Singapore: International Speech Communication Association, 2014. p. 959-963. ISBN: 978-1-63439-435-2.
Detail

KARAFIÁT, M.; JANDA, M.; ČERNOCKÝ, J.; BURGET, L. Region Dependent Linear Transforms in Multilingual Speech Recognition. In Proc. International Conference on Acoustics, Speech, and Signal Processing 2012. Kyoto: IEEE Signal Processing Society, 2012. p. 4885-4888. ISBN: 978-1-4673-0044-5.
Detail