Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
BURGET, R.
Originální název
Layout Based Information Extraction from HTML Documents
Typ
článek ve sborníku ve WoS nebo Scopus
Jazyk
angličtina
Originální abstrakt
We propose a method of information extraction from HTML documents based on modelling the visual information in the document. A page segmentation algorithm is used for detecting the document layout and subsequently, the extraction process is based on the analysis of mutual positions of the detected blocks and their visual features. This approach is more robust that the traditional DOM-based methods and it opens new possibilities for the extraction task specification.
Klíčová slova
page segmentation, layout analysis, information extraction
Autoři
Rok RIV
2007
Vydáno
23. 9. 2007
Nakladatel
IEEE Computer Society
Místo
Curitiba
ISBN
0-7695-2822-8
Kniha
9th International Conference on Document Analysis and Recognition ICDAR 2007
Strany od
624
Strany do
629
Strany počet
6
BibTex
@inproceedings{BUT28821, author="Radek {Burget}", title="Layout Based Information Extraction from HTML Documents", booktitle="9th International Conference on Document Analysis and Recognition ICDAR 2007", year="2007", pages="624--629", publisher="IEEE Computer Society", address="Curitiba", isbn="0-7695-2822-8" }