Title: Eurospeech05
1Latent Prosody Analysis for Robust Speaker
Identification
1Yuan-Fu Liao, 2Zi-He Chen and 3Yau-Tarng Juang
1National Taipei University of Technology,
Taipei, Taiwan 2,3National Central University,
Jhongli, Taiwan
Problems
Latent prosody analysis scheme
- Degradation from handset mismatch
- Unseen handsets in the test phrase
- Prosodic features extraction effectively
- Sparse data
Conventional Approaches
- Spectral feature-based methods
- Feature domain
- Model domain
- Score domain
- Prosodic modeling methods
- Pitch and energy distribution
- Prosodic pattern statistics
- Prosodic contour dynamics
Prosodic features, which are known to carry the
speaker information and to be weakly sensitive to
handset and channel mismatch, are attractive to
further address the robustness issue of handset
compensation.
Tokenization procedure
Those conventional approaches try to directly
model the observed surface prosodic features for
speaker discrimination. However, the behavior of
speech prosody is also affected by many latent
factors other than speaker and the variability of
the observed prosodic features is quite large. In
order to absorb the influence of those
non-speaker-specific factors on prosodic
modeling, large amounts of both enrollment and
testing data are required.
Proposed method
Latent prosody space
- Tokenization
- Automatically extract prosodic features and label
the prosodic contour into a sequence of
long-range prosodic cues. - Latent prosody analysis (LPA)
- Decompose the prosodic keyword-speaker
co-occurrence matrix using LSA or PLSA to
construct a discriminative latent prosody space
representing the constellation. - Speaker retrieval
- Project the sequence of the testing utterance
into the latent prosody space to retrieve the
most probable registered speaker.
Conclusions
Experiments
- The LPA approach automatically extract the most
discriminative prosody cues to assist spectral
feature-based speaker identification. - The LPA method outperforms the conventional
methods for both cases of counts all and only
unseen handsets.