Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
NADIMPALLI, V. KESIRAJU, S. BANKA, R. KETHIREDDY, R. GANGASHETTY, S.
Originální název
Resources and Benchmarks for Keyword Search in Spoken Audio From Low-Resource Indian Languages
Typ
článek v časopise ve Web of Science, Jimp
Jazyk
angličtina
Originální abstrakt
This paper presents the resources and benchmarks developed for keyword search (KWS) in spoken audio from six low-resource Indian languages (from two families), namely Gujarati, Hindi, Marathi, Odia, Tamil, and Telugu. The current work on constructing keywords and building benchmark KWS systems is inspired by the popular IARPA Babel program and the subsequent works on low-resource KWS. The keywords are constructed by taking into account their properties i.e., occurrence, length, and average confusability; and their effects on the evaluation metric - the term-weighted value (TWV).We make use of freely available speech datasets, and reprocess them to create resources for KWS, thereby adding value to the existing speech resources. Four ASR-based KWS systems are built, and their performance is analyzed across the three keyword properties on all the six languages. The prepared keywords and other related resources to replicate our experiments are made available for the public.We believe that the analysis and guidelines provided in this paper will not only help the research community, but also practitioners and engineers to easily create KWS resources for newer languages, datasets, and scenarios.
Klíčová slova
Keyword search, low-resource languages, term-weighted value (TWV)
Autoři
NADIMPALLI, V.; KESIRAJU, S.; BANKA, R.; KETHIREDDY, R.; GANGASHETTY, S.
Vydáno
28. 3. 2022
ISSN
2169-3536
Periodikum
IEEE Access
Ročník
10
Číslo
2022
Stát
Spojené státy americké
Strany od
34789
Strany do
34799
Strany počet
11
URL
https://ieeexplore.ieee.org/document/9743904
BibTex
@article{BUT182528, author="NADIMPALLI, V. and KESIRAJU, S. and BANKA, R. and KETHIREDDY, R. and GANGASHETTY, S.", title="Resources and Benchmarks for Keyword Search in Spoken Audio From Low-Resource Indian Languages", journal="IEEE Access", year="2022", volume="10", number="2022", pages="34789--34799", doi="10.1109/ACCESS.2022.3162854", issn="2169-3536", url="https://ieeexplore.ieee.org/document/9743904" }