Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikace
MOTLÍČEK, P., ČERNOCKÝ, J.
Originální název
Multimodal Phoneme Recognition of Meeting Data
Typ
článek v časopise - ostatní, Jost
Jazyk
angličtina
Originální abstrakt
This paper describes experiments in automatic recognition of context-independent phoneme strings from meeting data using audio-visual features. Visual features are known to improve accuracy and noise robustness of automatic speech recognizers. However, many problems appear when not "visually clean'' data is provided, such as data without limited variation in the speaker's frontal pose, lighting conditions, background, etc. The goal of this work was to test whether visual information can be helpful for recognition of phonemes using neural nets. While the audio part is fixed and uses standard Mel filter-bank energies, different features describing the video were tested: average brightness, DCT coefficients extracted from region-of-interest (ROI), optical flow analysis and lip-position features. The recognition was evaluated on a sub-set of IDIAP meeting room data. We have seen small improvement when compared to purely audio-recognition, but further work needs to be done especially concerning the determination of reliability of video features.
Klíčová slova
speech processing, audio-video processing, feature extraction, pattern recognition
Autoři
Rok RIV
2004
Vydáno
8. 9. 2004
ISSN
0302-9743
Periodikum
Lecture Notes in Computer Science
Ročník
Číslo
3206
Stát
Spolková republika Německo
Strany od
379
Strany do
384
Strany počet
6
URL
http://www.springerlink.com/index/U0DJ8GHXE220LX81
BibTex
@article{BUT45741, author="Petr {Motlíček} and Jan {Černocký}", title="Multimodal Phoneme Recognition of Meeting Data", journal="Lecture Notes in Computer Science", year="2004", volume="2004", number="3206", pages="6", issn="0302-9743", url="http://www.springerlink.com/index/U0DJ8GHXE220LX81" }