Přístupnostní navigace
E-application
Search Search Close
Publication detail
MILIČKA, M. BURGET, R.
Original Title
Multi-aspect Document Content Analysis using Ontological Modelling
Type
article in a collection out of WoS and Scopus
Language
English
Original Abstract
Existing methods of information extraction from web documents are usually based on a single aspect of the document or its contents such as the code, textual features or visual features. Due to the great variability of the available online documents, it seems reasonable to combine multiple kinds of analysis in order to use all the available knowledge for identifying a particular information in the document. In this paper, we propose an ontological document model that allows to integrate the results of the analysis of different document aspects. We propose a generic architecture of an information extraction system based on this model and we show its applicability on a practical example.
Keywords
document modeling, information extraction, page segmentation, content classification, ontology, RDF
Authors
MILIČKA, M.; BURGET, R.
RIV year
2014
Released
20. 11. 2014
Publisher
Vydavateľstvo STU
Location
Smolenice
ISBN
978-80-227-4267-2
Book
Proceedings of 9th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2014)
Pages from
9
Pages to
12
Pages count
4
URL
https://www.fit.vut.cz/research/publication/10724/
BibTex
@inproceedings{BUT111652, author="Martin {Milička} and Radek {Burget}", title="Multi-aspect Document Content Analysis using Ontological Modelling", booktitle="Proceedings of 9th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2014)", year="2014", pages="9--12", publisher="Vydavateľstvo STU", address="Smolenice", isbn="978-80-227-4267-2", url="https://www.fit.vut.cz/research/publication/10724/" }
Documents
wikt_burget.pdf