Course detail

Natural Language Processing

FIT-ZPDAcad. year: 2021/2022

Foundations of the natural language processing, historical perspective, statistical NLP and modern era dominated by machine learning and, specifically, deep neural networks. Meaning of individual words, lexicology and lexicography, word senses and neural architectures for computing word embeddings, word sense classification and inferrence. Constituency and dependency parsing, syntactic ambiguity, neural dependency parsers. Language modeling and its applications in general architectures. Machine translation, historical perspective on the statistical approach, neural translation and evaluation scores. End-to-end models, attention mechanisms, limits of current seq2seq models. Question answering based on neural models, information extraction components, text understanding challenges, learning by reading and machine comprehension. Text classification and its modern applications, convolutional neural networks for sentence classification. Language-independent representations, non-standard texts from social networks, representing parts of words, subword models. Contextual representations and pretraining for context-dependent language modules. Transformers and self-attention for generative models. Communication agents and natural language generation. Coreference resolution and its interconnection to other text understanding components.

Question topics for the State Doctoral Exams:

Distributional word semantics, Word2Vec, Glove, and FastText models
Language modelling
Machine translation
Seq2seq models and attention mechanism
Question answering
Convolutional neural networks for sentence classification
Modeling contexts of use: Contextual representations and pretraining
Transformers and self-attention for generative models
Natural language generation
Coreference resolution

Language of instruction

Czech

Mode of study

Not applicable.

Guarantor

doc. RNDr. Pavel Smrž, Ph.D.

Department

Department of Computer Graphics and Multimedia (UPGM)

Learning outcomes of the course unit

The students will get acquainted with natural language processing and will understand a range of neural network models that are commonly applied in the field. They will also grasp basics of neural implementations of attention mechanisms and sequence embedding models and how these modular components can be combined to build state of the art NLP systems. They will be able to implement and to evaluate common neural network models for various NLP applications.
Students will improve their programming skills and their knowledge and practical experience with tools for deep learning as well as with general processing of textual data.

Prerequisites

Not applicable.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

Discussions within the lectures or individual consultations, a check of the prepared report.

Course curriculum

Not applicable.

Work placements

Not applicable.

Aims

To understand natural language processing and to learn how to apply basic algorithms in this field. To get acquainted with the algorithmic description of the main language levels: morphology, syntax, semantics, and pragmatics, as well as the resources of natural language data - corpora. To conceive basics of knowledge representation, inference, and relations to the artificial intelligence.

Specification of controlled education, way of implementation and compensation for absences

Lectures and a preparation of a report.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Not applicable.

Recommended reading

Deng, Li, and Yang Liu, eds. Deep Learning in Natural Language Processing. Springer, 2018.
Géron, Aurélien. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.", 2017.
Goldberg, Yoav. "Neural network methods for natural language processing." Synthesis Lectures on Human Language Technologies 10, no. 1 (2017): 1-309.
Raaijmakers, Stephan. Deep Learning for Natural Language Processing. Manning, 2019.

Classification of course in study plans

Programme DIT Doctoral 0 year of study, winter semester, compulsory-optional
Programme DIT Doctoral 0 year of study, winter semester, compulsory-optional
Programme CSE-PHD-4 Doctoral
branch DVI4 , 0 year of study, winter semester, elective
Programme CSE-PHD-4 Doctoral
branch DVI4 , 0 year of study, winter semester, elective
Programme DIT-EN Doctoral 0 year of study, winter semester, compulsory-optional
Programme DIT-EN Doctoral 0 year of study, winter semester, compulsory-optional
Programme CSE-PHD-4 Doctoral
branch DVI4 , 0 year of study, winter semester, elective
Programme CSE-PHD-4 Doctoral
branch DVI4 , 0 year of study, winter semester, elective

Type of course unit

Lecture

39 hod., optionally

Teacher / Lecturer

doc. RNDr. Pavel Smrž, Ph.D.

Syllabus

Introduction, history of NLP, and modern approaches based on deep learning
Word senses and word vector
Dependency parsing
Language models
Machine translation
Seq2seq models and attention
Question answering
Convolutional neural networks for sentence classification
Information from parts of words: Subword models
Modeling contexts of use: Contextual representations and pretraining
Transformers and self-attention for generative models
Natural language generation
Coreference resolution

Guided consultation in combined form of studies

26 hod., optionally

Teacher / Lecturer

doc. RNDr. Pavel Smrž, Ph.D.

VUT

Faculties

University Institutes

Parts

Natural Language Processing

Type of course unit