Course detail
Data Storage and Preparation
FIT-UPAAcad. year: 2023/2024
The course focuses on modern database systems as typical data sources for knowledge discovery and further on the preparation of data for knowledge discovery. Discussed are extended relational (object-relational, with support for working with XML and JSON documents), spatial, and NoSQL database systems. The corresponding database model, the way of working with data and some methods of indexing are explained. In the context of the knowledge discovery process, attention is paid to the descriptive characteristics of data and visualization techniques used to data understanding. In addition, approaches to solving typical data pre-processing tasks for knowledge discovery, such as data cleaning, integration, transformation, reduction, etc. are explained. Approaches to information extraction from the web are also presented and several real case studies are presented.
Language of instruction
Number of ECTS credits
Mode of study
Guarantor
Department
Entry knowledge
- Fundamentals of relational databases and SQL.
- Object-oriented paradigm.
- Fundamentals of XML.
- Fundaments of computational geometry.
- Fundaments of statistics and probability.
Rules for evaluation and completion of the course
- Mid-term written exam; there is no resit; excused absences are solved by the guarantor deputy.
- The implementation and submission of the project results in the prescribed terms; excused absences are solved by the assistant.
- Final exam with; the minimal number of points which can be obtained from the final exam is 20 (otherwise, no points will be assigned to the student); excused absences are solved by the guarantor deputy.
Aims
Students will be able to store and manipulate data in suitable database systems, to explore data and prepare data for modelling within knowledge discovery process.
- Student is better able to work with data in various situations.
- Student improves in solving small projects in a small team.
Study aids
Prerequisites and corequisites
Basic literature
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers, 2012, p. 703, ISBN 978-0-12-381479-1 (EN)
Kim, W. (ed.): Modern Database Systems, ACM Press, 1995, ISBN 0-201-59098-0 (EN)
Lemahieu, W., Broucke, S., Baesens, B.: Principles of Database Management. Cambridge University Press. 2018, 780 p. (EN)
Melton, J.: Advanced SQL: 1999 - Understanding Object-Relational and Other Advanced. Morgan Kaufmann, 2002, 562 s., ISBN 1-558-60677-7 (EN)
Shekhar, S., Chawla, S.: Spatial Databases: A Tour, Prentice Hall, 2002/2003, 262 s., ISBN 0-13-017480-7 (EN)
Skiena, S.S.: The Data Science Design Manual. Springer, 2017, 445 s. ISBN 978-3-319-55443-3. (EN)
Recommended reading
Elearning
Classification of course in study plans
- Programme MITAI Master's
specialization NSPE , 1 year of study, winter semester, compulsory
specialization NBIO , 1 year of study, winter semester, compulsory
specialization NSEN , 1 year of study, winter semester, compulsory
specialization NVIZ , 1 year of study, winter semester, compulsory
specialization NGRI , 0 year of study, winter semester, compulsory
specialization NADE , 1 year of study, winter semester, compulsory
specialization NISD , 1 year of study, winter semester, compulsory
specialization NMAT , 0 year of study, winter semester, compulsory
specialization NSEC , 0 year of study, winter semester, compulsory
specialization NISY up to 2020/21 , 0 year of study, winter semester, compulsory
specialization NCPS , 1 year of study, winter semester, compulsory
specialization NHPC , 0 year of study, winter semester, compulsory
specialization NNET , 1 year of study, winter semester, compulsory
specialization NMAL , 1 year of study, winter semester, compulsory
specialization NVER , 0 year of study, winter semester, compulsory
specialization NIDE , 1 year of study, winter semester, compulsory
specialization NEMB , 0 year of study, winter semester, compulsory
specialization NISY , 0 year of study, winter semester, compulsory
specialization NEMB up to 2021/22 , 0 year of study, winter semester, compulsory
Type of course unit
Lecture
Teacher / Lecturer
Syllabus
- Introduction, object-oriented approach in databases.
- NoSQL databases I - introduction to NoSQL, CAP theorem and BASE, key-value databases, data partitioning and distribution.
- NoSQL databases II -data models in NoSQL databases (column, document, and graph databases), querying and data aggregation, NewSQL databases.
- Data preparation - data understanding: descriptive characteristics, visualization techniques, correlation analysis.
- Data preparation - data pre-processing I: data cleaning and integration.
- Data preparation - data pre-processing II: data reduction, imbalanced data, data transformation, other data pre-processing tasks.
- Midterm exam.
- Web scraping.
- Semantic web and linked data.
- Languages and systems for knowledge discovery, real case studies.
- Support for working with XML and JSON documents in databases.
- Spatial databases.
- Indexing of multidimensional data.
Seminar
Teacher / Lecturer
Syllabus
- Objects and documents in databases
- NoSQL databases
- Knowledge discovery from data - data preprocessing
Exercise in computer lab
Teacher / Lecturer
Syllabus
- Objects and documents in databases
- NoSQL databases
- Knowledge discovery from data - data preprocessing
Project
Teacher / Lecturer
Syllabus
Creating an application for processing large structured and unstructured data, which includes, among other things, obtaining and retrieving data, preparing them for further use (e.g., knowledge discovery in databases) and creating descriptive characteristics for selected data.
Elearning