Flexible Speaker Adaptation using Maximum Likelihood Linear Regression - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression

Description:

ARPA Spoken Language Technology Workshop, 1995. 2. Outline. Introduction. MLLR Overview. Fixed and Dynamic Regression Classes. Supervised Adaptation vs. ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 20
Provided by: ozNth
Category:

less

Transcript and Presenter's Notes

Title: Flexible Speaker Adaptation using Maximum Likelihood Linear Regression


1
Flexible Speaker Adaptation using Maximum
Likelihood Linear Regression
  • Authors C. J. Leggetter
  • P. C. Woodland
  • Presenter ???

Proc. ARPA Spoken Language Technology Workshop,
1995
2
Outline
  • Introduction
  • MLLR Overview
  • Fixed and Dynamic Regression Classes
  • Supervised Adaptation vs. Unsupervised Adaptation
  • Evaluation on WSJ Data
  • Conclusion

3
Introduction
  • Speaker Independent (SI) Recognition systems
  • Poor performance
  • Easy to get lots of training data
  • Speaker Dependent (SD) Recognition systems
  • Better performance
  • Difficult to get enough training data
  • Solution SI system adaptation with little SD
    data
  • Advantage Little SD data is required
  • Problem some models are not updated

4
Introduction (aim of the paper)
  • MLLR (Maximum Likelihood Linear Regression)
    approach
  • Parameter transformation technique
  • All models are updated with little adaptation
    data
  • Adapts the SI system by transforming the mean
    parameters with a set of linear transforms
  • Dynamic Regression Classes approach
  • Optimizing the adaptation procedure during
    runtime
  • Allows all models of adaptation to be performed
    in a single framework

5
MLLR Overview
  • Regression Classes
  • The set of Gaussians that shares the same
    transformation

SD Data
Mixture components
Regression Classes
transform
Transformation Matrix (W)
estimate
6
MLLR Overview (cont.)
SI mean
SD mean
Therefore, for a single Gaussian distribution,
the probability density function of state j
generating a speech observation vector o of
dimension n is
7
Estimation of MLLR matrices
Gaussian covariance matrices are diagonal A set
of T frames of adaptation data O o1 o2 oT Wj
is tied between R Gaussians j1 j2 jR
Wj can be updated column by column
8
Estimation of MLLR matrices (cont.)
zi ith column of Z
The probability of occupying state j at time t
while generating O
c(r)ii is the ith diagonal element of the rth
tied state covariance scaled by the total state
occupation probability
9
MLLR for Incremental Adaptation
  • Can be implemented by accumulating the time
    dependent components separately
  • Accumulate the observation vectors associated
    with each Gaussian and the associated occupation
    probability
  • MLLR equations can be implemented as any time

10
Fixed Regression Classes
  • Regression classes are predetermined by assessing
  • the amount of adaptation data
  • Mixture component clustering procedure based on a
    likelihood measure
  • Number of regression classes is roughly
    proportional to the number of adaptation data
  • Disadvantage
  • Needs to know the adaptation data in advance
  • Some regression classes might not have sufficient
    amount of data
  • Poor estimates of the transformations
  • Class may be dominated by a specific mixture
    component

11
Dynamic Regression Classes
  • Mixture components are arranged into a tree
  • Leaves of the tree are
  • For small HMM system individual mixture
    component
  • For large HMM system base classes containing a
    set of mixture components
  • These components are similar in divergence
    measure
  • Leaves in a tree are then merged into groups of
    similar components based on a distance measure
    (divergence)

12
Supervised Adaptation vs. Unsupervised Adaptation
Note Fixed regression class approach was used
Figure Supervised vs. Unsupervised adaptation
using RM corpus
13
Evaluation on WSJ Data
  • Experiment settings
  • Dynamic regression classes approach
  • Baseline Speaker Independent system (refer to
    5.1)
  • S3 test
  • Static supervised adaptation for non-native
    speakers
  • S4 test
  • Incremental unsupervised adaptation for native
    speakers

14
Regression Class Tree Settings
  • Distance measure
  • Divergence between mixture components
  • Use clustering algorithm to generate 750 base
    classes
  • 750 mixture components were chosen
  • Assign the nearest 10 to each base class
  • Assign the rest to the base classes by using an
    average distance measure from all the existing
    members
  • Regression tree was then built in a similar
    distance measure
  • Base classes are compared in pair-wise basis
    using an average divergence between all members
    of each class

15
S3 Test Results
16
S4 Test Results
Note Increase update interval large reduction
in adaptation computation and only small drop in
performance
17
Number of classes vs. number of sentences (S4
Test)
18
Adaptation in Nov94 Hi-P0 HTK System
  • Unsupervised adaptation
  • Adapt for 15 sentences from each speaker from
    unfiltered newspaper articles
  • About 15 million parameter in this HMM set
  • Used 750 base classes

19
Conclusion
  • MLLR approach can be used for both static and
    incremental adaptation
  • MLLR approach can be used for both supervised and
    unsupervised adaptation
  • Dynamic regression classes
Write a Comment
User Comments (0)
About PowerShow.com