Project detail
Rozpoznávání mluvené řeči v reálných podmínkách
Duration: 1.1.2008 — 31.12.2011
Funding resources
Grantová agentura České republiky - Standardní projekty
On the project
Projekt o Rozpoznávání mluvené řeči v reálných podmínkách. Projekt navazuje na předchozí grantově podporovaný výzkum, v němž se řešitelskému týmu podařilo vyvinout a částečně i realizovat základní metody rozpoznávání řeči v českém jazyce. Aby však mohly být úspěšně nasazeny v nejvíce žádaných aplikacích, jako jsou přepisy hovorů, záznamů diskusí nebo jednání v soudních síních, musí být pozornost zaměřena na analýzu a modelování běžné mluvené (hovorové) řeči zaznamenávané v reálných podmínkách za přítomnosti šumu, hluků, případně dalších mluvících osob.
Description in English
This project follows preceding research projects within which the participating
teams developed and implemented basic speech recognition algorithms for Czech.
For their successful use in the most challenging applications, such as
transcription of talks, recordings of court-hearings, etc., the research must
continue in analysis and modelling of colloquial speech recorded in real
conditions (e.g. with different backgrounds, noises, or with cross-talk). The
main goal of this four-year project is to design and test new speech feature
extraction techniques, background or noise suppression, speaker change-point
detection, quick adaptation to new speaker characteristics, to improve lexical
and phonetic inventory of recognition systems for colloquial speech, and also to
develop language models with better coverage of inflective nature of Czech. This
project will contribute to advancing the state-of-the-art in basic research of
speech recognition and it will facilitate the integration of involved teams into
European research community.
Keywords
rozpoznávání řeči
Key words in English
speech recognition
Mark
GA102/08/0707
Default language
Czech
People responsible
Pollák Petr - principal person responsible
Burget Lukáš, doc. Ing., Ph.D. - fellow researcher
Černocký Jan, prof. Dr. Ing. - fellow researcher
Matějka Pavel, Ing., Ph.D. - fellow researcher
Schwarz Petr, Ing., Ph.D. - fellow researcher
Units
Department of Computer Graphics and Multimedia
- responsible department (10.4.2008 - not assigned)
Faculty of Information Technology
- responsible department (13.5.2011 - not assigned)
Speech Data Mining Research Group BUT Speech@FIT
- internal (10.4.2008 - 31.12.2011)
Department of Computer Graphics and Multimedia
- co-beneficiary (1.1.2008 - 31.12.2011)
Faculty of Information Technology
- beneficiary (13.5.2011 - not assigned)
Results
SCHWARZ, P. Phoneme recognition based on long temporal context. Brno: Faculty of Information Technology BUT, 2009. p. 1-95.
Detail
KARAFIÁT, M. Study of linear transformations applied to training of cross-domain adapted large vocabulary continuous speech recognition systems. Brno: 2009. p. 0-0.
Detail
DEORAS, A.; MIKOLOV, T.; KOMBRINK, S.; CHURCH, K. Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model. Speech Communication, 2012, vol. 2012, no. 8, p. 1-16. ISSN: 0167-6393.
Detail
MIKOLOV, T.; KOMBRINK, S.; DEORAS, A.; BURGET, L.; ČERNOCKÝ, J. RNNLM - Recurrent Neural Network Language Modeling Toolkit. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 1-4. ISBN: 978-1-4673-0366-8.
Detail
HAIN, T.; BURGET, L.; DINES, J.; GARNER, P.; GRÉZL, F.; EL HANNANI, A.; HUIJBREGTS, M.; KARAFIÁT, M.; LINCOLN, M.; WAN, V. Transcribing Meetings with the AMIDA System. IEEE Transactions on Audio, Speech, and Language Processing, 2012, vol. 20, no. 2, p. 486-498. ISSN: 1558-7916.
Detail
KOMBRINK, S.; HANNEMANN, M.; BURGET, L. Out-of-Vocabulary Word Detection and Beyond. In Detection and Identification of Rare Audiovisual Cues. Studies in Computational Intelligence, 384. Springer-Verlag Berlin Heidelberg: Springer Verlag, 2012. p. 57-65. ISBN: 978-3-642-24033-1.
Detail
CUMANI, S.; PLCHOT, O.; KARAFIÁT, M. Independent Component Analysis and MLLR Transforms for Speaker Identification. Proc. International Conference on Acoustics, Speech, and Signal P. Kyoto: IEEE Signal Processing Society, 2012. p. 4365-4368. ISBN: 978-1-4673-0044-5.
Detail
POVEY, D.; HANNEMANN, M.; BOULIANNE, G.; BURGET, L.; GHOSHAL, A.; JANDA, M.; KARAFIÁT, M.; KOMBRINK, S.; MOTLÍČEK, P.; QIAN, Y.; RIEDHAMMER, K.; VESELÝ, K.; VU, N. Generating Exact Lattices in The WFST Framework. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012. p. 4213-4216. ISBN: 978-1-4673-0044-5.
Detail
KOMBRINK, S.; MIKOLOV, T. Recurrent Neural Network Language Modeling Applied to the Brno AMI/AMIDA 2009 Meeting Recognizer Setup. Proceedings of the 17th Conference STUDENT EEICT 2011. Volume 3. Brno: Brno University of Technology, 2011. p. 527-531. ISBN: 978-80-214-4273-3.
Detail
PEŠÁN, J. Rozpoznávání mluvčího na mobilním telefonu. Proceedings of the 17th Conference Student EEICT 2011. Volume 2. Brno: Vysoké učení technické v Brně, 2011. s. 341-343. ISBN: 978-80-214-4272-6.
Detail
GRÉZL, F.; KARAFIÁT, M.; JANDA, M. Study of Probabilistic and Bottle-Neck Features in Multilingual Environment. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 359-364. ISBN: 978-1-4673-0366-8.
Detail
MIKOLOV, T.; DEORAS, A.; POVEY, D.; BURGET, L.; ČERNOCKÝ, J. Strategies for Training Large Scale Neural Network Language Models. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 196-201. ISBN: 978-1-4673-0366-8.
Detail
BOŘIL, H.; GRÉZL, F.; HANSEN, J. Front-End Compensation Methods for LVCSR Under Lombard Effect. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 1257-1260. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail
KOCKMANN, M.; FERRER, L.; BURGET, L.; ČERNOCKÝ, J. iVector Fusion of Prosodic and Cepstral Features for Speaker Verification. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 265-268. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail
GRÉZL, F. The Role of Neural Network Size in TRAP/HATS Feature Extraction. Proceedings Text, Speech and Dialogue 2011. Lecture Notes in Computer Science. LNAI 6836. Plzeň: Springer Verlag, 2011. p. 315-322. ISBN: 978-3-642-23537-5. ISSN: 0302-9743.
Detail
VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F. Convolutive Bottleneck Network Features for LVCSR. Proceedings of ASRU 2011. Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 42-47. ISBN: 978-1-4673-0366-8.
Detail
KARAFIÁT, M.; BURGET, L.; MATĚJKA, P.; GLEMBEK, O.; ČERNOCKÝ, J. iVector-Based Discriminative Adaptation for Automatic Speech Recognition. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 152-157. ISBN: 978-1-4673-0366-8.
Detail
KOMBRINK, S.; MIKOLOV, T.; KARAFIÁT, M.; BURGET, L. Recurrent Neural Network based Language Modeling in Meeting Recognition. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 2877-2880. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail
MIKOLOV, T.; DEORAS, A.; KOMBRINK, S.; BURGET, L.; ČERNOCKÝ, J. Empirical Evaluation and Combination of Advanced Language Modeling Techniques. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 605-608. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail
GRÉZL, F.; KARAFIÁT, M. Integrating recent MLP feature extraction techniques into TRAP architecture. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 1229-1232. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail
Link
Responsibility: Pollák Petr