Přístupnostní navigace
E-application
Search Search Close
Product detail
OTRUSINA, L. SMRŽ, P. SZNAPKA, J. ŠAFÁŘ, M.
Product type
software
Abstract
There are many aspects and objectives we had in mind when designing the new NER. First, it should avoid the performance bottlenecks common for the webbased APIs such as OpenCalais or AlchemyAPI. Second, it needs to achieve excellent precision and recall for geographical features, especially for places in Europe. Finally, the tool should perform disambiguation and normalization alongside the recognition process. To meet the second objective, we utilized the Geonames.org data exported from the database, which contains over 10 millions geographical features. Efficiency is brought by the finite state automaton (FSA) technology that can deal with the huge lists of names and is very fast in searching the input texts. We employed an efficient algorithm for constructing the minimal FSA described in Daciuk et al. (1998). A freely available package provided by the first author of the paper allows building a minimal FSA from a list of pre-defined keywords. The resulting representation for all the relevant data from GeoNames (originally over 1.1 GB) takes only 71MB and the processing is extremely fast.
Keywords
name entitiy recognition, geonames.org, finite state automaton
Create date
15. 12. 2011
Location
www.fit.vutbr.cz/~iotrusina/BURGeoN-0.1.tar.gz
Possibilities of use
K využití výsledku jiným subjektem je vždy nutné nabytí licence
Licence fee
Poskytovatel licence na výsledek nepožaduje licenční poplatek
www
https://www.fit.vut.cz/research/product/228/