Přístupnostní navigace
E-application
Search Search Close
Publication detail
BARTÍK, V. BURGET, R.
Original Title
Two-Phase Categorization of Web Documents
Type
article in a collection out of WoS and Scopus
Language
English
Original Abstract
The number of pages on the World Wide Web is permanently growing and there is a need to process pages efficiently and obtain some useful knowledge from them. Web page categorization is a very important issue in this area. The method proposed here takes both visual and textual information into consideration. It consists of two phases. In the first phase, web page areas obtained by segmentation are classified based on their visual properties, and in the second phase, pages are classified, based on information from the first phase and textual information. Several experiments with web pages taken from news web sites are presented in the final part of the paper.
Keywords
Web page categorization, visual block classification, term weighting, TF-IDF, page segmentation
Authors
BARTÍK, V.; BURGET, R.
RIV year
2010
Released
1. 11. 2010
Publisher
Institute for Systems and Technologies of Information, Control and Communication
Location
Valencia
ISBN
978-989-8425-28-7
Book
Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
Pages from
458
Pages to
462
Pages count
5
BibTex
@inproceedings{BUT34415, author="Vladimír {Bartík} and Radek {Burget}", title="Two-Phase Categorization of Web Documents", booktitle="Proceedings of the International Conference on Knowledge Discovery and Information Retrieval", year="2010", pages="458--462", publisher="Institute for Systems and Technologies of Information, Control and Communication", address="Valencia", isbn="978-989-8425-28-7" }