Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
BURGET, R.
Originální název
Analyzing Logical Structure of a Web Site
Typ
článek ve sborníku mimo WoS a Scopus
Jazyk
angličtina
Originální abstrakt
The today's World Wide Web consists mainly of documents written in Hypertext Markup Language (HTML). This language has been developed for describing the look of the documents and the references to other documents and therefore it has very poor facilities for describing the semantics and the structure of the contained data. Moreover, some of these facilities are often not used by the authors of the documents or they are not used in apropriate way. In our work, we are attempting to analyze the look and the stucture of a Web site represented by the facilities of the HTML language and create its logical model which would represent the data relations the same way a human user would see it. We propose a tree representation of a Web site and algorithms for the analysis of the most importatnt HTML constructions - section headings, lists, tables and links.
Klíčová slova
HTML analysis, Semi-structured data, Information extraction
Autoři
Rok RIV
2002
Vydáno
4. 4. 2002
Místo
Ostrava
ISBN
80-85988-70-4
Kniha
Proceedings of 5th International Conference ISM '02 - Information Systems Modelling
Strany od
29
Strany do
35
Strany počet
7
URL
http://www.fit.vutbr.cz/~burgetr/publications/ism2002.ps
BibTex
@inproceedings{BUT10013, author="Radek {Burget}", title="Analyzing Logical Structure of a Web Site", booktitle="Proceedings of 5th International Conference ISM '02 - Information Systems Modelling", year="2002", pages="29--35", address="Ostrava", isbn="80-85988-70-4", url="http://www.fit.vutbr.cz/~burgetr/publications/ism2002.ps" }