Eurospeech05 - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Eurospeech05

Description:

Decompose the prosodic keyword-speaker co-occurrence matrix using LSA or PLSA to ... Prosodic features, which are known to carry the speaker information and to be ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 2
Provided by: Wesl156
Category:

less

Transcript and Presenter's Notes

Title: Eurospeech05


1
Latent Prosody Analysis for Robust Speaker
Identification
1Yuan-Fu Liao, 2Zi-He Chen and 3Yau-Tarng Juang
1National Taipei University of Technology,
Taipei, Taiwan 2,3National Central University,
Jhongli, Taiwan
Problems
Latent prosody analysis scheme
  • Degradation from handset mismatch
  • Unseen handsets in the test phrase
  • Prosodic features extraction effectively
  • Sparse data

Conventional Approaches
  • Spectral feature-based methods
  • Feature domain
  • Model domain
  • Score domain
  • Prosodic modeling methods
  • Pitch and energy distribution
  • Prosodic pattern statistics
  • Prosodic contour dynamics
  • Advantage

Prosodic features, which are known to carry the
speaker information and to be weakly sensitive to
handset and channel mismatch, are attractive to
further address the robustness issue of handset
compensation.
Tokenization procedure
  • Shortcoming

Those conventional approaches try to directly
model the observed surface prosodic features for
speaker discrimination. However, the behavior of
speech prosody is also affected by many latent
factors other than speaker and the variability of
the observed prosodic features is quite large. In
order to absorb the influence of those
non-speaker-specific factors on prosodic
modeling, large amounts of both enrollment and
testing data are required.
Proposed method
Latent prosody space
  • Tokenization
  • Automatically extract prosodic features and label
    the prosodic contour into a sequence of
    long-range prosodic cues.
  • Latent prosody analysis (LPA)
  • Decompose the prosodic keyword-speaker
    co-occurrence matrix using LSA or PLSA to
    construct a discriminative latent prosody space
    representing the constellation.
  • Speaker retrieval
  • Project the sequence of the testing utterance
    into the latent prosody space to retrieve the
    most probable registered speaker.

Conclusions
Experiments
  • The LPA approach automatically extract the most
    discriminative prosody cues to assist spectral
    feature-based speaker identification.
  • The LPA method outperforms the conventional
    methods for both cases of counts all and only
    unseen handsets.
  • HTIMIT database
  • ISCSLP2006-SRE database
Write a Comment
User Comments (0)
About PowerShow.com