Course detail
Parallel Data Processing
FEKT-MPA-PZPAcad. year: 2025/2026
Parallelization using CPU. Parallelization using GPU (matrix operations, deep learning algorithms). Technologies: Apache Spark, Hadoop, Kafka, Cassandra. Distributed computations for operations: data transformation, aggregation, classification, regression, clustering, frequent patterns, optimization. Data streaming – basic operations, state operations, monitoring. Further technologies for distributed computations.
Language of instruction
Number of ECTS credits
Mode of study
Guarantor
Department
Entry knowledge
Rules for evaluation and completion of the course
The content and forms of instruction in the evaluated course are specified by a regulation issued by the lecturer responsible for the course and updated for every academic year.
Aims
Students have skills of design and implementation of various forms of parallel systems to solve big data challenge. They will learn techniques for the parallelization of computations using CPU and GPU and further they will learn techniques for distributed computations. Students will control technologies Apache Spark, Kafka, Cassandra to solve distributed data processing with using data operations: data transformations, aggregation, classification, regression, clustering, frequent patterns.
Study aids
Prerequisites and corequisites
Basic literature
Recommended reading
Classification of course in study plans
- Programme MPAD-ACS Master's 2 year of study, winter semester, compulsory
- Programme MPAD-CAN Master's 2 year of study, winter semester, compulsory
- Programme MPA-EAK Master's 0 year of study, winter semester, compulsory-optional
- Programme MPAD-CAN Master's 2 year of study, winter semester, compulsory
- Programme MPC-TIT Master's 0 year of study, winter semester, compulsory-optional
Type of course unit
Lecture
Teacher / Lecturer
Syllabus
2. CPU Parallel Computing – Designing Parallel Programs, Threads, Processes, Synchronization.
3. Introduction to GPU – Streaming Multiprocessors, Threads, Blocks, Grids, and PyCUDA.
4. GPU Memory – Global, Shared; Speed and Sizes.
5. GPU Synchronization – Atomic Operations, Warps.
6. GPU Parallel Patterns – Warp Shuffles, Asynchronous Function Execution, Parallel Reduction.
7. GPU Matrix Operations and Streams – Matrix Multiplication, Streams and Devices, Utilizing Multiple GPUs.
8. Introduction to Spark – Jobs, Stages, Tasks, DAG, etc.
9. Advanced Operations in Spark – Shared Variables, Partitioning, Web Interface, DataFrames.
10. Machine Learning with Spark – Statistics, Pipelines, Feature Extraction, Classification, Clustering, etc.
11. Spark Streaming – DStreams, SQL Operations, MLlib Operations.
12. Other Parallel Technologies – Apache Kafka, Nvidia Jetson, TPU.
Exercise in computer lab
Teacher / Lecturer
Syllabus
Topics remain the same:
1. Introduction to Parallel Computing.
2. CPU Parallel Computing – Designing Parallel Programs, Threads, Processes, Synchronization.
3. Introduction to GPU – Streaming Multiprocessors, Threads, Blocks, Grids, and PyCUDA.
4. GPU Memory – Global, Shared; Speed and Sizes.
5. GPU Synchronization – Atomic Operations, Warps.
6. GPU Parallel Patterns – Warp Shuffles, Asynchronous Function Execution, Parallel Reduction.
7. GPU Matrix Operations and Streams – Matrix Multiplication, Streams and Devices, Utilizing Multiple GPUs.
8. Introduction to Spark – Jobs, Stages, Tasks, DAG, etc.
9. Advanced Operations in Spark – Shared Variables, Partitioning, Web Interface, DataFrames.
10. Machine Learning with Spark – Statistics, Pipelines, Feature Extraction, Classification, Clustering, etc.
11. Spark Streaming – DStreams, SQL Operations, MLlib Operations.
12. Other Parallel Technologies – Apache Kafka, Nvidia Jetson, TPU.
Project
Teacher / Lecturer
Syllabus