Course detail

Speech Processing

FEKT-MPC-ZREAcad. year: 2025/2026

The subject gives a comprehensive view of the solution of speech processing occurring in verbal communication. First, speech production, its perception, human auditory system and process of hearing are introduced. Then segmental and suprasegmental parameters that are frequently used in speech analysis are discussed. Furthermore, all important areas of speech processing are mentioned, especially speech analysis, pattern recognition, speech synthesis and coding. The method of pitch analysis, prosody modelling, emotion analysis, analysis of pathological voice, speech de-identification and speech watermarking are added. Attention is also paid to one-channel and multi-channel speech enhancement methods and noise cancellation. In the end, subjective and objective methods of assessing the quality and intelligibility of speech are introduced.

Language of instruction

Czech

Number of ECTS credits

Mode of study

Not applicable.

Guarantor

prof. Ing. Zdeněk Smékal, CSc.

Department

Department of Telecommunications (UTKO)

Entry knowledge

The knowledge on the Bachelor´s degree level is requested. Furthermore, the knowledge of digital signal processing methods and algorithms is required. Moreover, the students should have basics in Matlab programming.

Rules for evaluation and completion of the course

Computer lab exercises are mandatory for successfully passing this course and the students have to obtain the required credits. In computer laboratories they can get 30 points of 100 points. The remaining 70 points can be obtained by successfully passing the final exam.
The content and forms of instruction in the evaluated course are specified by a regulation issued by the lecturer responsible for the course and updated for every academic year.

Aims

The aim of the course is to give a comprehensive overview of speech communication in information and telecommunication systems. It is intended for students who want to learn the basic and advanced techniques of speech processing, analysis, synthesis, and speech coding. Apart from the basic principles of speaker identification the students will become familiar with problems of separating speech from noisy background, with principles of automatic speech recognition, and with applications in health monitoring systems. In addition, the students will analyse speech in real time in computer lab exercises.
On completion of the course, students are able to:
- describe vocal and auditory tract, and the way of speech production and its perception
- analyse speech using most common segmental and suprasegmental parameters
- apply cepstral and linear predictive analysis
- use machine learning in the field of speech processing (speech recognition, speaker recognition, speech pathology identification, emotion detection, etc.)
- design and implement text-to-speech system based on concatenation synthesis
- model vocal tract and perform speech coding
- use objective and subjective tests of speech quality and intelligibility assessment
- enhance speech using one- and multiple-channel methods
- design speech watermarking and de-identification system
- process/analyse speech signals using Matlab environment

Study aids

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

PSUTKA, J.; MÜLLER, L.; MATOUŠEK, J.; RADOVÁ, V. Mluvíme s počítačem česky. 1. vyd. Praha: Academia, 2006. ISBN 978-80-200-1309-5. (CS)
SMÉKAL, Z. Zpracování řeči. Brno: Vysoké učení technické v Brně, 2012. s. 1-171. ISBN: 978-80-214-4896-4. (CS)

Type of course unit

Lecture

26 hod., optionally

Teacher / Lecturer

prof. Ing. Zdeněk Smékal, CSc.

Syllabus

1. Způsob tvorby řeči a její vnímání. Sluchové ústrojí a proces slyšení
2. Analýza řečových signálů, segmentální a suprasegmentální parametry I, analýza základního tónu řeči
3. Analýza řečových signálů, segmentální a suprasegmentální parametry II
4. Analýza řečových signálů III, rozpoznávání vzoru (klasifikace založená na vzdálenostech)
5. Rozpoznávání vzoru (statistické klasifikátory)
6. Syntéza řeči a systémy typu TTS, modelování prozodie
7. Kódování řeči a její přenos
8. Objektivní a subjektivní metody posuzování kvality řeči a její srozumitelnosti
9. Jednokanálové a vícekanálové metody zvýrazňování řeči
10. Analýza emocí a její aplikace
11. Analýza neurodegenerativních onemocnění
12. Vodoznační řeči, de-identifikace řeči

Laboratory exercise

39 hod., compulsory

Teacher / Lecturer

prof. Ing. Zdeněk Smékal, CSc.

Syllabus

1. Fonetická a akustická analýza prvků řeči. Předzpracování řečových signálů.
2. Suprasegmentální rysy
3. Lineární predikční analýza řeči
4. Kepstrální analýza řeči
5. Rozpoznávání vzoru
6. Klasifikátory. Redukce příznakového prostoru.
7. Systémy TTS
8. Registrace projektů a písemný test
9. Práce na projektech
10. Práce na projektech
11. Práce na projektech
12. Odevzdávání a obhajoba projektů

VUT

Faculties

University Institutes

Parts

Speech Processing

Type of course unit