Eurospeech05 presentation

About This Presentation

Title:

Eurospeech05

Description:

Decompose the prosodic keyword-speaker co-occurrence matrix using LSA or PLSA to ... Prosodic features, which are known to carry the speaker information and to be ... –

Number of Views:49

Avg rating:3.0/5.0

Slides: 2

Provided by: Wesl156

Category:

more less

Transcript and Presenter's Notes

Title: Eurospeech05

1
Latent Prosody Analysis for Robust Speaker
Identification
1Yuan-Fu Liao, 2Zi-He Chen and 3Yau-Tarng Juang
1National Taipei University of Technology,
Taipei, Taiwan 2,3National Central University,
Jhongli, Taiwan
Problems
Latent prosody analysis scheme

Degradation from handset mismatch
Unseen handsets in the test phrase
Prosodic features extraction effectively
Sparse data

Conventional Approaches

Spectral feature-based methods
Feature domain
Model domain
Score domain
Prosodic modeling methods
Pitch and energy distribution
Prosodic pattern statistics
Prosodic contour dynamics

Advantage

Prosodic features, which are known to carry the
speaker information and to be weakly sensitive to
handset and channel mismatch, are attractive to
further address the robustness issue of handset
compensation.
Tokenization procedure

Shortcoming

Those conventional approaches try to directly
model the observed surface prosodic features for
speaker discrimination. However, the behavior of
speech prosody is also affected by many latent
factors other than speaker and the variability of
the observed prosodic features is quite large. In
order to absorb the influence of those
non-speaker-specific factors on prosodic
modeling, large amounts of both enrollment and
testing data are required.
Proposed method
Latent prosody space

Tokenization
Automatically extract prosodic features and label
the prosodic contour into a sequence of
long-range prosodic cues.
Latent prosody analysis (LPA)
Decompose the prosodic keyword-speaker
co-occurrence matrix using LSA or PLSA to
construct a discriminative latent prosody space
representing the constellation.
Speaker retrieval
Project the sequence of the testing utterance
into the latent prosody space to retrieve the
most probable registered speaker.

Conclusions
Experiments

The LPA approach automatically extract the most
discriminative prosody cues to assist spectral
feature-based speaker identification.
The LPA method outperforms the conventional
methods for both cases of counts all and only
unseen handsets.