Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail produktu
BAKO, M. BUCHAL, P. HRADIŠ, M.
Typ produktu
software
Abstrakt
This tool provides automatic quality assessment of digitalized documents. The estimated quality scores closely correspond to readability by humans. The tool provides quality score heatmaps and an overall quality score for a whole document page. The module computes local perceptual quality scores based on confidence scores from Optical Character Recognition (OCR) or directly by a fast convolutional neural network. This module is build on top of OCR developed in project PERO (pero-ocr). The text recognition works in multiple stages. Firstly, locations and heights of text lines are determined using a fully convolutional neural network (modified U-NET). The individual text lines are processed by covolutional-recurrent networks trained using CTC loss. These networks provide confidences of recognized characters which are locally mapped to perceptual scores. The mapping to perceptual scores was calibrated on a large dataset of readability ratings by human readers.
Klíčová slova
OCR, document, text quality, readability, Convolutional Networks
Datum vzniku
20. 12. 2019
Umístění
https://github.com/DCGM/pero-quality
Možnosti využití
K využití výsledku jiným subjektem je vždy nutné nabytí licence
Licenční poplatek
Poskytovatel licence na výsledek nepožaduje licenční poplatek
www