Project detail

Theory and applications of phoneme posterior estimation in speech processing

Duration: 01.01.2009 — 31.12.2011

Funding resources

Czech Science Foundation - Doktorské granty

- whole funder (2009-01-01 - 2011-12-31)

On the project

Značná pozornost v základním výzkumu zpracování řeči je věnována odhadování posteriorních pravděpodobností diskrétních řečových jednotek - fonémů. Odhady se uplatňují v parametrizaci signálu (posterior features), fonotaktických modelech (rozpoznávání jazyka), ve tvorbě fonémových svazů pro vyhledávání klíčových slov i v dalších aplikacích. Cílem tohoto projektu je vytvořit rychlý a spolehlivý systém pro odhad posteriorních pravděpodobností fonémů, který umožní snížení chybovosti systémů využívajících takových odhadů. Projekt se bude věnovat parametrizaci, diskriminativním transformacím, architekturám klasifikátorů a technikám trénování. Kvalita bude vyhodnocována především v mezinárodních evaluacích organizovaných americkým Národním úřadem pro standardizaci a technologie NIST.

Description in English
Estimation of posterior probabilities of discrete speech units - phonemes - has significant importance in basic speech processing research. The estimates are used in feature extraction (posterior features), phonotactic models for language recognition, generation of phoneme lattices for keyword spotting, and in other applications. The goal of this project is to create a fast and reliable system for estimation of such posterior probabilities that would allow to decrease error rates of the target systems. The project will deal with feature extraction, discriminative transforms, architectures of classifiers and techniques of training. The quality will be assessed mainly in international evaluations organized by US National Institute of Standards and Technology (NIST).

Keywords
zpracování řeči, rozpoznávání řeči, rozpoznávání fonémů, pravděpodobnostní parametry

Key words in English
speech processing, speech recognition, phoneme recognition, probabilistic features

Mark

GP102/09/P635

Default language

Czech

People responsible

Kopecký Jiří, Bc. - fellow researcher
Plchot Oldřich, Ing., Ph.D. - fellow researcher
Grézl František, Ing., Ph.D. - principal person responsible

Results

GRÉZL, F.; KARAFIÁT, M.; BURGET, L. Investigation into bottle-neck features for meeting speech recognition. Proc. Interspeech 2009. Proceedings of Interspeech. Brighton: International Speech Communication Association, 2009. p. 2947-2950. ISBN: 978-1-61567-692-7. ISSN: 1990-9772.
Detail

GRÉZL, F.; KARAFIÁT, M. Hierarchical Neural Net Architectures for Feature Extraction in ASR. Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010). Proceedings of Interspeech. Makuhari, Chiba: International Speech Communication Association, 2010. p. 1201-1204. ISBN: 978-1-61782-123-3. ISSN: 1990-9772.
Detail

HAIN, T.; BURGET, L.; DINES, J.; GARNER, P.; EL HANNANI, A.; HUIJBREGTS, M.; KARAFIÁT, M.; LINCOLN, M.; WAN, V. The AMIDA 2009 Meeting Transcription System. Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010). Proceedings of Interspeech. Makuhari, Chiba: International Speech Communication Association, 2010. p. 358-361. ISBN: 978-1-61782-123-3. ISSN: 1990-9772.
Detail

SZŐKE, I.; GRÉZL, F.; ČERNOCKÝ, J.; FAPŠO, M. Acoustic keyword spotter - optimization from end-user perspective. Proceedings of the 2010 IEEE Spoken Language Technology Workshop. IEEE Catalog Number: CFP 10SLT-USB. Berkeley, California: IEEE Signal Processing Society, 2010. p. 177-181. ISBN: 978-1-4244-7902-3.
Detail

GRÉZL, F.; ČERNOCKÝ, J. Audio Surveillance through Known Event Classification. Radioengineering, 2009, vol. 18, no. 4, p. 671-675. ISSN: 1210-2512.
Detail

GRÉZL, F.; KARAFIÁT, M. Integrating recent MLP feature extraction techniques into TRAP architecture. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 1229-1232. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail

HAIN, T.; BURGET, L.; DINES, J.; GARNER, P.; GRÉZL, F.; EL HANNANI, A.; HUIJBREGTS, M.; KARAFIÁT, M.; LINCOLN, M.; WAN, V. Transcribing Meetings with the AMIDA System. IEEE Transactions on Audio, Speech, and Language Processing, 2012, vol. 20, no. 2, p. 486-498. ISSN: 1558-7916.
Detail

KOMBRINK, S.; MIKOLOV, T.; KARAFIÁT, M.; BURGET, L. Recurrent Neural Network based Language Modeling in Meeting Recognition. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 2877-2880. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail

VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F. Convolutive Bottleneck Network Features for LVCSR. Proceedings of ASRU 2011. Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 42-47. ISBN: 978-1-4673-0366-8.
Detail

GRÉZL, F. The Role of Neural Network Size in TRAP/HATS Feature Extraction. Proceedings Text, Speech and Dialogue 2011. Lecture Notes in Computer Science. LNAI 6836. Plzeň: Springer Verlag, 2011. p. 315-322. ISBN: 978-3-642-23537-5. ISSN: 0302-9743.
Detail

KOCKMANN, M.; FERRER, L.; BURGET, L.; ČERNOCKÝ, J. iVector Fusion of Prosodic and Cepstral Features for Speaker Verification. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 265-268. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail

BOŘIL, H.; GRÉZL, F.; HANSEN, J. Front-End Compensation Methods for LVCSR Under Lombard Effect. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 1257-1260. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail

GRÉZL, F.; KARAFIÁT, M.; JANDA, M. Study of Probabilistic and Bottle-Neck Features in Multilingual Environment. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 359-364. ISBN: 978-1-4673-0366-8.
Detail

MIKOLOV, T.; DEORAS, A.; KOMBRINK, S.; BURGET, L.; ČERNOCKÝ, J. Empirical Evaluation and Combination of Advanced Language Modeling Techniques. Proceedings of Interspeech 2011. Proceedings of Interspeech. Florence: International Speech Communication Association, 2011. p. 605-608. ISBN: 978-1-61839-270-1. ISSN: 1990-9772.
Detail