Course detail

Data Storage and Preparation

FIT-UPAAcad. year: 2022/2023

The course focuses on modern database systems as typical data sources for knowledge discovery and further on the preparation of data for knowledge discovery. Discussed are extended relational (object-relational, with support for working with XML and JSON documents), spatial, and NoSQL database systems. The corresponding database model, the way of working with data and some methods of indexing are explained. In the context of the knowledge discovery process, attention is paid to the descriptive characteristics of data and visualization techniques used to data understanding. In addition, approaches to solving typical data pre-processing tasks for knowledge discovery, such as data cleaning, integration, transformation, reduction, etc. are explained. Approaches to information extraction from the web are also presented and several real case studies are presented.

Language of instruction

Czech

Number of ECTS credits

5

Mode of study

Not applicable.

Learning outcomes of the course unit

Students will be able to store and manipulate data in suitable database systems, to explore data and prepare data for modelling within knowledge discovery process.

  • Student is better able to work with data in various situations.
  • Student improves in solving small projects in a small team.

Prerequisites

  • Fundamentals of relational databases and SQL.
  • Object-oriented paradigm.
  • Fundamentals of XML.
  • Fundaments of computational geometry.
  • Fundaments of statistics and probability.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

  • Mid-term exam, for which there is only one schedule and, thus, there is no possibility to have another trial.
  • Project should be solved and delivered in given dates during a term.

Course curriculum

Not applicable.

Work placements

Not applicable.

Aims

The aim of the course is to explain the historical development of database technologies, motivation of knowledge discovery from data and basic steps of knowledge discovery process, to explain essence, properties and the use of extended relational and NoSQL databases as data sources for knowledge discovery and to explain approaches and methods used for data understanding and data pre-processing for knowledge discovery.

Specification of controlled education, way of implementation and compensation for absences

  • Mid-term written exam; there is no resit; excused absences are solved by the guarantor deputy.
  • The implementation and submission of the project results in the prescribed terms; excused absences are solved by the assistant.
  • Final exam with; the minimal number of points which can be obtained from the final exam is 20 (otherwise, no points will be assigned to the student); excused absences are solved by the guarantor deputy.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Gaede, V., Günther, O.: Multidimensional Access Methods, ACM Computing Surveys, Vol. 30, No. 2, 1998, s. 170-231. (EN)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers, 2012, p. 703, ISBN 978-0-12-381479-1 (EN)
Kim, W. (ed.): Modern Database Systems, ACM Press, 1995, ISBN 0-201-59098-0 (EN)
Lemahieu, W., Broucke, S., Baesens, B.: Principles of Database Management. Cambridge University Press. 2018, 780 p. (EN)
Melton, J.: Advanced SQL: 1999 - Understanding Object-Relational and Other Advanced. Morgan Kaufmann, 2002, 562 s., ISBN 1-558-60677-7 (EN)
Shekhar, S., Chawla, S.: Spatial Databases: A Tour, Prentice Hall, 2002/2003, 262 s., ISBN 0-13-017480-7 (EN)
Skiena, S.S.: The Data Science Design Manual. Springer, 2017, 445 s. ISBN 978-3-319-55443-3. (EN)

Recommended reading

Dunckley, L.: Multimedia Databases: An Object-Relational Approach. Pearson Education, 2003, p. 464, ISBN 0-201-78899-3 (EN)

Elearning

Classification of course in study plans

  • Programme MITAI Master's

    specialization NADE , 1 year of study, winter semester, compulsory
    specialization NBIO , 1 year of study, winter semester, compulsory
    specialization NCPS , 1 year of study, winter semester, compulsory
    specialization NEMB , 0 year of study, winter semester, compulsory
    specialization NGRI , 0 year of study, winter semester, compulsory
    specialization NHPC , 0 year of study, winter semester, compulsory
    specialization NIDE , 1 year of study, winter semester, compulsory
    specialization NISD , 1 year of study, winter semester, compulsory
    specialization NISY up to 2020/21 , 0 year of study, winter semester, compulsory
    specialization NMAL , 1 year of study, winter semester, compulsory
    specialization NMAT , 0 year of study, winter semester, compulsory
    specialization NNET , 1 year of study, winter semester, compulsory
    specialization NSEC , 0 year of study, winter semester, compulsory
    specialization NSEN , 1 year of study, winter semester, compulsory
    specialization NSPE , 1 year of study, winter semester, compulsory
    specialization NVER , 0 year of study, winter semester, compulsory
    specialization NVIZ , 1 year of study, winter semester, compulsory
    specialization NISY , 0 year of study, winter semester, compulsory
    specialization NEMB up to 2021/22 , 0 year of study, winter semester, compulsory

Type of course unit

 

Lecture

26 hod., optionally

Teacher / Lecturer

Syllabus

  1. History of database technology and knowledge discovery, process of knowledge discovery.
  2. Object-oriented approach in databases.
  3. NoSQL databases I - introduction to NoSQL, CAP theorem and BASE, key-value databases, data partitioning and distribution.
  4. NoSQL databases II -data models in NoSQL databases (column, document, and graph databases), querying and data aggregation, NewSQL databases.
  5. Web scraping.
  6. Data preparation - data understanding: descriptive characteristics, visualization techniques, correlation analysis.
  7. Data preparation - data pre-processing I: data cleaning and integration.
  8. Data preparation - data pre-processing II: data reduction, imbalanced data, data transformation, other data pre-processing tasks.
  9. Mid-term exam
  10. Languages and systems for knowledge discovery, real case studies.
  11. Support for working with XML and JSON documents in databases.
  12. Spatial databases.
  13. Indexing of multidimensional data.

Fundamentals seminar

6 hod., optionally

Teacher / Lecturer

Syllabus

  1. Objects and documents in databases
  2. NoSQL databases
  3. Knowledge discovery from data - data preprocessing

Exercise in computer lab

6 hod., optionally

Teacher / Lecturer

Syllabus

  1. Objects and documents in databases
  2. NoSQL databases
  3. Knowledge discovery from data - data preprocessing

Project

14 hod., compulsory

Teacher / Lecturer

Syllabus

Creating an application for processing large structured and unstructured data, which includes, among other things, obtaining and retrieving data, preparing them for further use (e.g., knowledge discovery in databases) and creating descriptive characteristics for selected data.

Elearning