Course detail

Speech Processing Systems

FIT-SREAcad. year: 2009/2010

Phonetics and phonology. Statistical pattern recognition. HMM training and adaptation. HMM recognition. Phoneme recognition. Keyword spotting and search. Speaker identification and verification. Language identification. CELP speech coding. Language modeling. Psycholinguistics. Probabilistic parsing.

Language of instruction

Czech

Number of ECTS credits

5

Mode of study

Not applicable.

Learning outcomes of the course unit

Students will extend the knowledge acquired in the basic speech signal processing and natural language processing courses toward modern methods. They will get acquainted with methods currently deployed in industrial applications (GSM telephones or commercially available recognizers). They will get acquainted with promising methods existing in research environment.  They will deepen their knowledge of natural langugage processing and language modelilng. This course allows students to implement simple speech processing applications, as for example voice command of a process. However, first of all it enables them to join the development of complex systems for speech recognition and coding systems in both academic and industrial environments.

Prerequisites

There are no prerequisites

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

Study evaluation is based on marks obtained for specified items. Minimimum number of marks to pass is 50.

Course curriculum

  1. Phonetics and phonology - syllable structure, phonological processes and distinctive features.
  2. Statistical pattern classification I. - Bayesian framework, Maximum likelihood learning, Gaussian mixture models. Features for GMM modeling.
  3. Statistical pattern classification II. - Artificial Neural Networks, Support vector machines. Sequence modeling - Hidden Markov models. 
  4. HMM training and adaptation - MLLR, MAP, discriminative training.
  5. HMM recognition - pronunciation dictionaries and networks, language modeling, decoding, lattices.
  6. Phoneme recognition. Keyword spotting and search - LVCSR, acoustic and phonetic lattices. Figure of Merit.
  7. Speaker identification and verification - GMM, SVM. Channel normalization and compensation - feature mapping, eigen-voices and nuissance attributes projection (NAP). Evaluation of speaker verification: DET curves, EER, cost function.
  8. Language identification - acoustic vs. phonotactic, evaluation.
  9. Speech coding - CELP framework - adaptive and stochastic codebooks, GSM standards.
  10. Language modeling 1 - n-gram models, class-based models
  11. Language modeling 2 - language-specific features, factored-language models
  12. Psycholinguistics - word recognition models, word associations
  13. Probabilistic parsing - inside-outside algorithm, dependency parsing

Work placements

Not applicable.

Aims

To extend the on the structure of language (phonetics, phonology) and acquire bases of statistical classifiers. To get acquainted with advanced methods of speech recognition and coding. To get acquainted with advanced methods of language modeling and syntactic analysis.

Specification of controlled education, way of implementation and compensation for absences

  • mid-term test - 20pts
  • presentation of projects - 30pts
  • exam - 50pts

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Gussenhoven, J. and Jacobs, H.: Understanding Phonology, Oxford University Press, 1998, ISBN: 0-340-69218-9 Psutka, J.: Komunikace s počítačem mluvenou řečí. Academia, Praha, 1995, ISBN 80-200-0203-0. Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000, ISBN 0-471-35154-7. Moore, B.C.J.: An introduction to the psychology of hearing, Academic Press, 1989, ISBN 0-12-505627-3. Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, 1998, ISBN 0-262-10066-5. Manning, C. and Schütze, H.: Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.

Recommended reading

Psutka, J.: Komunikace s počítačem mluvenou řečí. Academia, Praha, 1995, ISBN 80-200-0203-0. Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000, ISBN 0-471-35154-7.

Classification of course in study plans

  • Programme IT-MSC-2 Master's

    branch MBI , 0 year of study, winter semester, elective
    branch MBS , 0 year of study, winter semester, elective
    branch MGM , 2 year of study, winter semester, elective
    branch MGM , 2 year of study, winter semester, elective
    branch MIN , 0 year of study, winter semester, compulsory-optional
    branch MIN , 0 year of study, winter semester, elective
    branch MIS , 0 year of study, winter semester, elective
    branch MIS , 0 year of study, winter semester, elective
    branch MMI , 0 year of study, winter semester, elective
    branch MMM , 0 year of study, winter semester, elective
    branch MPS , 0 year of study, winter semester, elective
    branch MPV , 0 year of study, winter semester, elective
    branch MSK , 0 year of study, winter semester, elective

Type of course unit

 

Lecture

39 hod., optionally

Teacher / Lecturer

Syllabus

  1. Phonetics and phonology - syllable structure, phonological processes and distinctive features.
  2. Statistical pattern classification I. - Bayesian framework, Maximum likelihood learning, Gaussian mixture models. Features for GMM modeling.
  3. Statistical pattern classification II. - Artificial Neural Networks, Support vector machines. Sequence modeling - Hidden Markov models. 
  4. HMM training and adaptation - MLLR, MAP, discriminative training.
  5. HMM recognition - pronunciation dictionaries and networks, language modeling, decoding, lattices.
  6. Phoneme recognition. Keyword spotting and search - LVCSR, acoustic and phonetic lattices. Figure of Merit.
  7. Speaker identification and verification - GMM, SVM. Channel normalization and compensation - feature mapping, eigen-voices and nuissance attributes projection (NAP). Evaluation of speaker verification: DET curves, EER, cost function.
  8. Language identification - acoustic vs. phonotactic, evaluation.
  9. Speech coding - CELP framework - adaptive and stochastic codebooks, GSM standards.
  10. Language modeling 1 - n-gram models, class-based models
  11. Language modeling 2 - language-specific features, factored-language models
  12. Psycholinguistics - word recognition models, word associations
  13. Probabilistic parsing - inside-outside algorithm, dependency parsing

Project

13 hod., optionally

Teacher / Lecturer