Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
BURGET, R.
Originální název
Visual Area Classification for Article Identification in Web Documents
Typ
článek ve sborníku mimo WoS a Scopus
Jazyk
angličtina
Originální abstrakt
In the World Wide Web, the news and other articles are usually published in complex HTML documents containing many types of additional information that is not explicitly marked. In this paper, we propose a visual information analysis approach to the article discovery in complex HTML documents. We use a classification approach for the identification the important parts of the article within the page and we propose an algorithm for the detection of the article bounds within the page. Finally, we provide the results of an experimental evaluation.
Klíčová slova
article extraction, document cleaning, page segmentation, visual analysis
Autoři
Rok RIV
2010
Vydáno
30. 8. 2010
Nakladatel
IEEE Computer Society
Místo
Bilbao
ISBN
978-0-7695-4174-7
Kniha
21st International Workshop on Databases and Expert Systems Applications
Strany od
171
Strany do
175
Strany počet
5
BibTex
@inproceedings{BUT35628, author="Radek {Burget}", title="Visual Area Classification for Article Identification in Web Documents", booktitle="21st International Workshop on Databases and Expert Systems Applications", year="2010", pages="171--175", publisher="IEEE Computer Society", address="Bilbao", isbn="978-0-7695-4174-7" }