Bachelor's Thesis
Data Import from Unstructured Data Sources
Author of thesis: Bc. Vojtěch Kučera
Acad. year: 2023/2024
Supervisor: Ing. Vladimír Bartík, Ph.D.
Reviewer: Ing. Ivana Burgetová, Ph.D.
Abstract:This thesis focuses on data extraction from validation protocols in the PDF format. These protocols are generated by insurance providers. The thesis introduces the PDF format, some of the methods used for data extraction from files in the PDF format and describes the design and implementation of a tool for extraction of data from validation protocols. This tool was implemented in Python and uses user-editable finite state machines to achieve this task. The output of the program is a single file in one of the following formats: txt, csv, xlsx, xml, sql. The sql output is designed to save data to a database table utilized by STAPRO s.r.o.
PDF, extractor, data extraction, validation protocol, insurance provider, finite state machine, FSM, Python
Date of defence
12.06.2024
Date of publish
12.06.2027
Result of the defence
Defended (thesis was successfully defended)
Grading
C
Process of defence
Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm C.
Topics for thesis defence
- Proč se ve Vašich automatech rozlišují koncové a nekoncové stavy? Jakým způsobem se ve výstupu nástroje projeví to, v jakém stavu automat ukončil svoji činnost?
- Co ve vašich automatech reprezentují koncové stavy? Chybové stavy?
- Proč vaše aplikace selhává na protokolech VZP?
Language of thesis
Czech
Faculty
Department
Study programme
Information Technology (BIT)
Composition of Committee
doc. Dr. Ing. Dušan Kolář (předseda)
Ing. Vladimír Bartík, Ph.D. (člen)
Ing. Jaroslav Dytrych, Ph.D. (člen)
doc. Mgr. Adam Rogalewicz, Ph.D. (člen)
Ing. Marcela Zachariášová, Ph.D. (člen)
Supervisor’s report
Ing. Vladimír Bartík, Ph.D.
Grade proposed by supervisor: B
Reviewer’s report
Ing. Ivana Burgetová, Ph.D.
Grade proposed by reviewer: C
Reasons for publication postponement
Publication of the final thesis has been postponed in compliance with the provisions of Section 47b (4) of Act No. 111/1998 Coll., on the Higher Education Institutions and on amendments and supplements to other acts, as amended.
The publication of the bachelor's thesis is in accordance with the provision of § 47b par. 4 of the Act no. 111/1998, about universities and about the change and supplementing other laws (Higher Education Act), as amended, delayed by 3 years. The reason for the delay of the publication is the protection of intellectual property and the fact that the thesis contains business secret in the sense of the relevant provisions of the Act no. 89/2012 Coll., Civil Code.