Course detail

Processor Architecture

FIT-ACHAcad. year: 2018/2019

The course covers architecture of universal as well as special-purpose processors. Instruction-level parallelism (ILP) is studied on scalar, superscalar and VLIW processors. Then the processors with thread-level parallelism (TLP) are discussed. Data parallelism is illustrated on SIMD streaming instructions and on graphical processors (SIMT). Parallelization of numerical calculations for GPU is also covered (CUDA). Techniques of  low-power processors are also explained.

Language of instruction

Czech

Number of ECTS credits

5

Mode of study

Not applicable.

Learning outcomes of the course unit

Overview of processor microarchitecture and its future trends, ability to compare processors and using suitable tools, simulate the influence of changes in their architecture. Get acquainted with processor performance measurement. The knowledge of architecture and hardware support of parallel computation on graphic processors can be directly applied for acceleration of intensive calculations. 

Prerequisites

Von Neumann computer architecture, memory hierarchy, programming in assembly language, compiler's tasks and functions

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

Assessment of two projects, 13 hours in total and, computer laboratories and a midterm examination.
Exam prerequisites:
To get 20 out of 40 points for projects and midterm examination.

Course curriculum

Not applicable.

Work placements

Not applicable.

Aims

To familiarize students with architecture of the newest processors exploiting the instruction-level, thread-level and data-level parallelism. To clarify the role of a compiler and its cooperation with CPU. To be able to orientate oneself on the processor market, to evaluate and compare various CPUs. Next to familiarize with architecture of graphical processors and its use for acceleration of numerical calculations (GPGPU), and with low-power techniques in processors for mobile applications.  

Specification of controlled education, way of implementation and compensation for absences

  • Missed labs can be substituted in alternative dates (monday or friday)
  • There will be a place for missed labs in the last week of the semester.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Not applicable.

Recommended reading

Agner Fog: Software optimization resources
Baer, J.L.: Microprocessor Architecture. Cambridge University Press, 2010, 367 s., ISBN 978-0-521-76992-1
current PPT slides for lectures
Hennessy, J.L., Patterson, D.A.: Computer Architecture - A Quantitative Approach. 5. vydání, Morgan Kaufman Publishers, Inc., 2012, 493 s., ISBN: 978-0-12-383872-8
http://inst.eecs.berkeley.edu/~cs152/sp13/
https://www.anandtech.com
Intel Architecture Optimization Manual
Jeffers, J., and Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming, 2013, Morgan Kaufmann, p. 432), ISBN: 978-0-124-10414-3
Kirk, D., and Hwu, W.: Programming Massively Parallel Processors: A Hands-on Approach, Elsevier, 2010, s. 256, ISBN: 978-0-12-381472-2
Nvidia CUDA SDK Manual

Classification of course in study plans

  • Programme IT-MSC-2 Master's

    branch MBI , 0 year of study, winter semester, elective
    branch MSK , 2 year of study, winter semester, compulsory-optional
    branch MMM , 0 year of study, winter semester, elective
    branch MBS , 0 year of study, winter semester, compulsory-optional
    branch MPV , 2 year of study, winter semester, compulsory
    branch MIS , 0 year of study, winter semester, elective
    branch MIN , 0 year of study, winter semester, elective
    branch MGM , 2 year of study, winter semester, elective

Type of course unit

 

Lecture

26 hod., optionally

Teacher / Lecturer

Syllabus

  1. Scalar processors. Pipelined instruction processing and compiler asistance
  2. Superscalar CPU. Dynamic instruction scheduling, branch prediction.
  3. Advanced superscalar processing techniques: register renaming, data flow through memory hierarchy.
  4. Optimization of instruction and data fetching. Examples of superscalar CPUs.
  5. Multi-threaded processors.
  6. Data parallelism. SIMD extensions and vectorization.
  7. Architecture of graphics processing units, SIMT programming model.
  8. CUDA programming language, thread and memory model.
  9. Synchronisation and reduction on GPU, design and tuning of GPU codes.
  10. Stream processing, multi-GPU systems, GPU libraries.
  11. Architecture of many core systems (MIC, Xeon Phi) and their programming.
  12. VLIW processors. SW pipelining, predication, binary translation.
  13. Low power processors.

Exercise in computer lab

10 hod., compulsory

Teacher / Lecturer

Syllabus

  1. Performance measurement for sequential codes.
  2. Vectorisation using OpenMP 4.0.
  3. CUDA: Memory transfers, simple kernels.
  4. CUDA: Shared memory.
  5. CUDA: Texture and constant memory, reduction operation.

Project

16 hod., compulsory

Teacher / Lecturer

Syllabus

  • Performance evaluation and code optimization using OpenMP 4.0
  • Acceleration of computational job using CUDA 8.0