Course detail

Modern Methods of Speech Processing

FIT-MZDAcad. year: 2021/2022

From simple systems to stochastic modelling. Hidden Markov models. Large vocabulary continuous speech recognition. Language models. Speech production, speech perception: time and frequency. Data-driven methods for feature extraction. Speech databases. Excitation in speech coding, CELP. Speaker identification.

Language of instruction

Czech

Mode of study

Not applicable.

Guarantor

prof. Dr. Ing. Jan Černocký

Department

Department of Computer Graphics and Multimedia (UPGM)

Learning outcomes of the course unit

This course allows students to implement simple speech processinga pplications, as for example voice command of a process. However, first of all it enables them to join the development of complex systems for speech recognition and coding systems, using modern methods, in academic and industrial environments.

Prerequisites

basic knowledge of digitial signal processing, having attended a basic course on speech processing is advantageous.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

Not applicable.

Course curriculum

Not applicable.

Work placements

Not applicable.

Aims

We will mention methods currently implemented in industrial applications (such as mobile phones or commercially available recognizers) but will not promissing methods existing so far only in laboratories. Attention will be paid to techniques derived using data and inspired by human autition and speech production.

Specification of controlled education, way of implementation and compensation for absences

attending the course is not checked, the evaluation of the course is upon the results of exam or final report.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Not applicable.

Recommended reading

Dutoit, T.: An Introduction to Text-To-Speech Synthesis, Kluwer Academic Publishers, 1997
Fukunaga, K.: Introduction to Statistical Pattern Recognition, Academic Press, 1990
Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000
Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, 1998
Moore, B.C.J., : An introduction to the psychology of hearing, Academic Press, 1989
Psutka, J.: Komunikace s s počítačem mluvenou řečí. Academia, Praha, 1995
Texty z http://www.fit.vutbr.cz/~cernocky/speech/
Vapnik, V. N.: Statistical Learning Theory, Wiley-Interscience, 1998

Classification of course in study plans

Programme DIT Doctoral 0 year of study, winter semester, compulsory-optional
Programme DIT Doctoral 0 year of study, winter semester, compulsory-optional
Programme CSE-PHD-4 Doctoral
branch DVI4 , 0 year of study, winter semester, elective
Programme CSE-PHD-4 Doctoral
branch DVI4 , 0 year of study, winter semester, elective
Programme DIT-EN Doctoral 0 year of study, winter semester, compulsory-optional
Programme DIT-EN Doctoral 0 year of study, winter semester, compulsory-optional
Programme CSE-PHD-4 Doctoral
branch DVI4 , 0 year of study, winter semester, elective
Programme CSE-PHD-4 Doctoral
branch DVI4 , 0 year of study, winter semester, elective

Type of course unit

Lecture

39 hod., optionally

Teacher / Lecturer

prof. Dr. Ing. Jan Černocký

Syllabus

Review of notions: signal vectors and parameter matrices, basic statistics.
Stochastic modeling of parameters, modeling of time by state sequences.
Hidden Markov models: basic structure, training.
Recognition of speech using HMM: Viterbi search, token passing.
Pronunciation dictionaries and language models.
Speech production and derived parameters: LPC, Log area ratios, line spectral pairs.
Speech perception and derived parameters: Mel-frequency cepstral coefficients, Perceptual linear prediction.
Temporal properties of hearing - RASTA filtering.
Training the feature extractor on the data - linear discriminant analysis.
Speech databases: standards, contents, speakers, annotations.
Vocoders and modeling of the excitation: multi-pulse and stochastic excitations (GSM coding).
CELP coding: long-term predictor, codebooks. Very low bit-rate coders.
Current methods of speaker identification and verification.

Guided consultation in combined form of studies

26 hod., optionally

Teacher / Lecturer

prof. Dr. Ing. Jan Černocký

Name	Hostname	Vendor	Expiry
nosec_sess	www.vut.cz www.fch.vut.cz www.favu.vut.cz		400 days
PHPSESSID	www.vut.cz		Session
Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
fontsLoaded	www.vut.cz www.fch.vut.cz www.favu.vut.cz		7 days
nosec_sess	www.fch.vut.cz		400 days
PHPSESSID	www.fch.vut.cz		Session
Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
fontsLoaded	www.fch.vut.cz		7 days
nosec_sess	www.favu.vut.cz		400 days
PHPSESSID	www.favu.vut.cz		Session
Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
fontsLoaded	www.favu.vut.cz		7 days
logo_displayed	.www.vut.cz .www.fch.vut.cz .www.favu.vut.cz		5 hours
logo_displayed	.www.fch.vut.cz		5 hours
logo_displayed	.www.favu.vut.cz		5 hours
_nss	www.vut.cz www.fch.vut.cz www.favu.vut.cz		Session
_nss	www.fch.vut.cz		Session
_nss	www.favu.vut.cz		Session
cookiehub	.vut.cz	CookieHub	365 days
Used by CookieHub to store information about whether visitors have given or declined the use of cookie categories used on the site.
__cf_bm	.myfonts.net	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.

Name	Hostname	Vendor	Expiry
_ga	.vut.cz	Google	400 days
Contains a unique identifier used by Google Analytics to determine that two distinct hits belong to the same user across browsing sessions.
_ga_	.vut.cz	Google	400 days
Contains a unique identifier used by Google Analytics 4 to determine that two distinct hits belong to the same user across browsing sessions.

Name	Hostname	Vendor	Expiry
_gcl_au	.vut.cz	Google Advertising Products	90 days
Used by Google AdSense to understand user interaction with the website by generating analytical data.
_fbp	.vut.cz	Meta Platforms	90 days
Facebook Pixel advertising first-party cookie. Used by Facebook to track visits across websites to deliver a series of advertisement products such as real time bidding from third party advertisers.
IDE	.doubleclick.net	Google Advertising Products	390 days
Used by Google's DoubleClick to serve targeted advertisements that are relevant to users across the web. Targeted advertisements may be displayed to users based on previous visits to a website. These cookies measure the conversion rate of ads presented to the user.
lastExternalReferrerTime		Meta Platforms	Persistent
Detects how the user reached the website by registering their last URL-address.
lastExternalReferrer		Meta Platforms	Persistent
Detects how the user reached the website by registering their last URL-address.
YSC	.youtube.com	Google	Session
This cookie is set by YouTube video service on pages with YouTube embedded videos to track views.
VISITOR_INFO1_LIVE	.youtube.com	Google	180 days
Set by YouTube and used for various purposes, including analytical and advertising.
VISITOR_PRIVACY_METADATA	.youtube.com	Google	180 days
ar_debug	.googleadservices.com		90 days
Enable/disable attribution report debugging. Attribution reporting is a Google Privacy Sandbox feature to measure conversions without third-party cookies.

Name	Hostname	Vendor	Expiry
VUTSESSIONID	www.vut.cz		Session
ssupp_		SmartSupp	Persistent
__Secure-ROLLOUT_TOKEN	.youtube.com		180 days

VUT

Faculties

University Institutes

Parts

Modern Methods of Speech Processing

Type of course unit