Title: Detection of Target Speakers in Audio Databases
1Detection of Target Speakers in Audio Databases
- Ivan Magrin-Chagnolleau , Aaron E. Rosenberg ,
and S. Parthasarathy
Rice University, Houston, Texas - ATT Labs
Research, Florham Park, New Jersey
ivan_at_ieee.org - aer_at_research.att.com - sps_at_
research.att.com
Problem and Definitions ? Data - broadcast band
audio data from television news programs
containing speech segments from a variety of
speakers plus segments containing mixed speech
and music (typically commercials), and music
only. Speech segments may have variable quality
and may be contaminated by music, speech, and/or
noise backgrounds. ? Speaker detection task -
locate and label segments of designated speakers
(target speakers) in the data. ? Overall goal -
aid information retrieval from large multimedia
databases. ? Assumption - segmented and labeled
training data exist for target speakers, other
speakers, and other audio material.
Detection algorithm
? log-likelihood ratio
? smoothed log-likelihood ratio every
vectors with
(1s) and (0.2s)
? segmentation algorithm
Future directions ? use more than one model for
each target speaker. ? use more background
models. ? study the performances as a function of
the smoothing parameters and the segmentation
algorithm parameters. ? use a new post processor
to find the best path through a speaker lattice.
Note 1 This work has been done when the first
author was with ATT Labs Research.
Note 2 The first author would like to thank Rice
University for financing his conference
participation.