Course detail

Speech Signal Analysis and Synthesis

FEKT-MATeAcad. year: 2010/2011

Phonetic description of the Czech language, signal windowing, preemphasis, pitch estimation. Representations of a speech in the time and frequency domains, short-time analysis of speech signal, selection of suitable features, word endpoints detection, linear and nonlinear time warping, isolated word recognition system, connected word recognition, suitable speech units and features for speaker recognition, hidden Markov models, speaker identification, speaker verification, speech synthesis, vocoders, special integrated circuits for speech processing, some typical applications of speech and speaker recognition.

Language of instruction

English

Number of ECTS credits

8

Mode of study

Not applicable.

Learning outcomes of the course unit

The students become familiar with the phonetic description of the Czech language, speech signal features, selection of suitable speech features, speech and speakers recognition systems, speech synthesis, vocoders, special integrated circuits for speech processing, some typical applications.

Prerequisites

The subject knowledge on the Bachelor´s degree level is requested.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Teaching methods depend on the type of course unit as specified in the article 7 of BUT Rules for Studies and Examinations.

Assesment methods and criteria linked to learning outcomes

Computer exercices for 30 points during the term and written examen for 70 points.

Course curriculum

Introduction, acoustic theory of speech production.
Vocal tract model, phonetic description of Czech language.
Preprocessing of speech signal: windowing, preemphasis.
Energy, zero-crossing rate and autocorrelation function.
Linear prediction coding and derived coefficients.
Cepstral analysis of speech signal.
Estimation of fundamental speech frequency.
Linear and nonlinear time alignments.
Deterministical and statistical classificators, hidden Markov models.
Classificators learning, error rate estimation.
Voice recognition, speaker verification and identification.
Speech synthesis methods.
Speech coding and transmission, basic types of vocoders.

Work placements

Not applicable.

Aims

The aim of the course is to make students familiar with the basic methods for automatic recognition of isolated spoken words, with the approaches for speaker verification and identification based on their voice and with the speech synthesis methods.

Specification of controlled education, way of implementation and compensation for absences

Pass out all the computer exercices

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

JELINEK, F., Statistical Methods for Speech Recognition. The MIT Press, Cambridge, MA, 1997.
KATAGIRI, S., Handbook of Neural Networks for Speech Processing. Artech House, London, 2000.
RABINER, R., JUANG, B. H., Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, N.J., 1993.

Recommended reading

Not applicable.

Classification of course in study plans

  • Programme EECC-MN Master's

    branch MN-EST , 1 year of study, summer semester, elective specialised

Type of course unit

 

Lecture

39 hod., optionally

Teacher / Lecturer

Syllabus

Introduction, acoustic theory of speech production.
Vocal tract model, phonetic description of Czech language.
Preprocessing of speech signal: windowing, preemphasis.
Energy, zero-crossing rate and autocorrelation function.
Linear prediction coding and derived coefficients.
Cepstral analysis of speech signal.
Estimation of fundamental speech frequency.
Linear and nonlinear time alignments.
Deterministical and statistical classificators, hidden Markov models.
Classificators learning, error rate estimation.
Voice recognition, speaker verification and identification.
Speech synthesis methods.
Speech coding and transmission, basic types of vocoders.

Exercise in computer lab

52 hod., compulsory

Teacher / Lecturer

Syllabus

Illustration of the speech waveform, details of phonemes.
Spectrum of typical vowel sounds, formant frequencies.
Spectrum analysis using Hamming and rectangular window.
Short-time energy and zero-crossings for(un)voiced speech.
Detection of speech/pause and word boundaries.
Linear prediction of speech waveform and derived spectra.
Automatic recognition of vowels.
Correlations between various speech signal parameters.
Calculation of several distances between speech frames.
Automatic recognition of an unknown word.
Segmentation of a word string into phonetic units.
Measuring of fundamental frequency by Center-Clipping.
Cepstral analysis for voiced speech.
Identification of different speakers.