Přístupnostní navigace
E-application
Search Search Close
Product detail
BEDNÁŘ, M. SCHAUER, M.
Product type
software
Abstract
The software consists of two modules: - a tool for automatic web crawling and capturing JavaScript calls (hereinafter referred to as "Crawler"), - a tool for the analysis of acquired data and mining information from them (hereinafter referred to as the "Analyzer"). The crawler (https://github.com/martinbednar/web_crawler) automatically visits websites and uses a customized extension Web API Manager to capture what JavaScript calls the page made. Individual calls are being stored into a database. The tool is able to record and store hundreds of thousands of calls from a single website and retrieve units of TB of data when running over a million most visited sites. In the Crawler, it is possible to intercept JavaScript calls with security extensions installed (e.g. uBlock Origin). This was used to obtain two datasets - one for browsing with a security extension and the other without it. The Analyzer (https://github.com/martinbednar/web_crawler_data_analysis) tool provides processing of collected data, display of aggregated results and significant values. With two data sets collected, the Analyzer can compare JavaScript calls with and without a security extension, which answers research questions about security and privacy on the Web. Using the tool, for example, we found that on the 250 thousands most visited websites (according to the Tranco list), with the security extension uBlock Origin, approximately 30% of all JavaScript calls were blocked, when the API Range (https://developer.mozilla.org/en-US/docs/Web/API/Range) was suppressed the most. The complete results were published on the FIT cloud (https://nextcloud.fit.vutbr.cz/s/LHxP4cYaTnoNHWQ).
Keywords
JavaScript, API, Web browser, Web crawl, Security, Privacy, Fingerprint
Create date
18. 11. 2021
Location
https://polcak.github.io/jsrestrictor-dev/blogarticles/crawling_results.html
Possibilities of use
K využití výsledku jiným subjektem je vždy nutné nabytí licence
Licence fee
Poskytovatel licence na výsledek nepožaduje licenční poplatek
www