Přístupnostní navigace
E-application
Search Search Close
Publication detail
BURGET, R.
Original Title
Visual Area Classification for Article Identification in Web Documents
Type
article in a collection out of WoS and Scopus
Language
English
Original Abstract
In the World Wide Web, the news and other articles are usually published in complex HTML documents containing many types of additional information that is not explicitly marked. In this paper, we propose a visual information analysis approach to the article discovery in complex HTML documents. We use a classification approach for the identification the important parts of the article within the page and we propose an algorithm for the detection of the article bounds within the page. Finally, we provide the results of an experimental evaluation.
Keywords
article extraction, document cleaning, page segmentation, visual analysis
Authors
RIV year
2010
Released
30. 8. 2010
Publisher
IEEE Computer Society
Location
Bilbao
ISBN
978-0-7695-4174-7
Book
21st International Workshop on Databases and Expert Systems Applications
Pages from
171
Pages to
175
Pages count
5
BibTex
@inproceedings{BUT35628, author="Radek {Burget}", title="Visual Area Classification for Article Identification in Web Documents", booktitle="21st International Workshop on Databases and Expert Systems Applications", year="2010", pages="171--175", publisher="IEEE Computer Society", address="Bilbao", isbn="978-0-7695-4174-7" }