Publication detail

SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics

DELCROIX, M. ŽMOLÍKOVÁ, K. KINOSHITA, K. ARAKI, S. OGAWA, A. NAKATANI, T.

Original Title

Type

journal article in Scopus

Language

English

Original Abstract

In a noisy environment such as a cocktail party, humans can focus on listening to a desired speaker, anability known as selective hearing. Current approaches developed to realize computational selectivehearing require knowing the position of the target speaker, which limits their practical usage. This articleintroduces SpeakerBeam, a deep learning based approach for computational selective hearing based onthe characteristics of the target speakers voice. SpeakerBeam requires only a small amount of speechdata from the target speaker to compute his/her voice characteristics. It can then extract the speech ofthat speaker regardless of his/her position or the number of speakers talking in the background.

Keywords

deep learning, target speaker extraction, SpeakerBeam

Authors

DELCROIX, M.; ŽMOLÍKOVÁ, K.; KINOSHITA, K.; ARAKI, S.; OGAWA, A.; NAKATANI, T.

Released

1. 11. 2018

ISBN

1348-3447

Periodical

NTT Technical Review

Year of study

Number

State

Japan

Pages from

Pages to

Pages count

URL

https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr201811all.pdf&mode=show_pdf

BibTex

@article{BUT185149,
  author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and KINOSHITA, K. and ARAKI, S. and OGAWA, A. and NAKATANI, T.",
  title="SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics",
  journal="NTT Technical Review",
  year="2018",
  volume="16",
  number="11",
  pages="19--24",
  issn="1348-3447",
  url="https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr201811all.pdf&mode=show_pdf"
}

Documents

delcroix_ntt_technical_review_2018_journal paper.pdf

VUT

Faculties

University Institutes

Parts

SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics