Course detail

Parallel Data Processing

FEKT-MPA-PZPAcad. year: 2025/2026

Parallelization using CPU. Parallelization using GPU (matrix operations, deep learning algorithms). Technologies: Apache Spark, Hadoop, Kafka, Cassandra. Distributed computations for operations: data transformation, aggregation, classification, regression, clustering, frequent patterns, optimization. Data streaming – basic operations, state operations, monitoring. Further technologies for distributed computations.

Language of instruction

English

Number of ECTS credits

Mode of study

Not applicable.

Guarantor

doc. Ing. Radim Burget, Ph.D.

Department

Department of Telecommunications (UTKO)

Entry knowledge

Not applicable.

Rules for evaluation and completion of the course

final exam
The content and forms of instruction in the evaluated course are specified by a regulation issued by the lecturer responsible for the course and updated for every academic year.

Aims

The goal of the course is to introduce parallelization for data analysis with using common processors, graphic processors and distributed systems.
Students have skills of design and implementation of various forms of parallel systems to solve big data challenge. They will learn techniques for the parallelization of computations using CPU and GPU and further they will learn techniques for distributed computations. Students will control technologies Apache Spark, Kafka, Cassandra to solve distributed data processing with using data operations: data transformations, aggregation, classification, regression, clustering, frequent patterns.

Study aids

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Dasgupta, Nataraj. "Practical big data analytics: Hands-on techniques to implement enterprise analytics and machine learning using Hadoop, Spark, NoSQL and R." (2018) (EN)

Type of course unit

Lecture

26 hod., optionally

Teacher / Lecturer

doc. Ing. Radim Burget, Ph.D.

Syllabus

1. Introduction to Parallel Computing.
2. CPU Parallel Computing – Designing Parallel Programs, Threads, Processes, Synchronization.
3. Introduction to GPU – Streaming Multiprocessors, Threads, Blocks, Grids, and PyCUDA.
4. GPU Memory – Global, Shared; Speed and Sizes.
5. GPU Synchronization – Atomic Operations, Warps.
6. GPU Parallel Patterns – Warp Shuffles, Asynchronous Function Execution, Parallel Reduction.
7. GPU Matrix Operations and Streams – Matrix Multiplication, Streams and Devices, Utilizing Multiple GPUs.
8. Introduction to Spark – Jobs, Stages, Tasks, DAG, etc.
9. Advanced Operations in Spark – Shared Variables, Partitioning, Web Interface, DataFrames.
10. Machine Learning with Spark – Statistics, Pipelines, Feature Extraction, Classification, Clustering, etc.
11. Spark Streaming – DStreams, SQL Operations, MLlib Operations.
12. Other Parallel Technologies – Apache Kafka, Nvidia Jetson, TPU.

Exercise in computer lab

26 hod., compulsory

Teacher / Lecturer

doc. Ing. Radim Burget, Ph.D.

Syllabus

During exercises we will use Python and its library to create own programs and we will use Google Colab as main enviroment.

Topics remain the same:

Project

13 hod., optionally

Teacher / Lecturer

doc. Ing. Radim Burget, Ph.D.

Syllabus

This is an independent project that is developed during the semester, where the student applies the acquired knowledge to solving a specific complex task.

VUT

Faculties

University Institutes

Parts

Parallel Data Processing

Type of course unit