Flexible Speaker Adaptation using Maximum Likelihood Linear Regression

About This Presentation

Title:

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression

Description:

ARPA Spoken Language Technology Workshop, 1995. 2. Outline. Introduction. MLLR Overview. Fixed and Dynamic Regression Classes. Supervised Adaptation vs. ... – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 20

Provided by: ozNth

Category:

more less

Transcript and Presenter's Notes

Title: Flexible Speaker Adaptation using Maximum Likelihood Linear Regression

1
Flexible Speaker Adaptation using Maximum
Likelihood Linear Regression

Authors C. J. Leggetter
P. C. Woodland
Presenter ???

Proc. ARPA Spoken Language Technology Workshop,
1995
2
Outline

Introduction
MLLR Overview
Fixed and Dynamic Regression Classes
Supervised Adaptation vs. Unsupervised Adaptation
Evaluation on WSJ Data
Conclusion

3
Introduction

Speaker Independent (SI) Recognition systems
Poor performance
Easy to get lots of training data
Speaker Dependent (SD) Recognition systems
Better performance
Difficult to get enough training data
Solution SI system adaptation with little SD
data
Advantage Little SD data is required
Problem some models are not updated

4
Introduction (aim of the paper)

MLLR (Maximum Likelihood Linear Regression)
approach
Parameter transformation technique
All models are updated with little adaptation
data
Adapts the SI system by transforming the mean
parameters with a set of linear transforms
Dynamic Regression Classes approach
Optimizing the adaptation procedure during
runtime
Allows all models of adaptation to be performed
in a single framework

5
MLLR Overview

Regression Classes
The set of Gaussians that shares the same
transformation

SD Data
Mixture components
Regression Classes
transform
Transformation Matrix (W)
estimate
6
MLLR Overview (cont.)
SI mean
SD mean
Therefore, for a single Gaussian distribution,
the probability density function of state j
generating a speech observation vector o of
dimension n is
7
Estimation of MLLR matrices
Gaussian covariance matrices are diagonal A set
of T frames of adaptation data O o1 o2 oT Wj
is tied between R Gaussians j1 j2 jR
Wj can be updated column by column
8
Estimation of MLLR matrices (cont.)
zi ith column of Z
The probability of occupying state j at time t
while generating O
c(r)ii is the ith diagonal element of the rth
tied state covariance scaled by the total state
occupation probability
9
MLLR for Incremental Adaptation

Can be implemented by accumulating the time
dependent components separately
Accumulate the observation vectors associated
with each Gaussian and the associated occupation
probability
MLLR equations can be implemented as any time

10
Fixed Regression Classes

Regression classes are predetermined by assessing
the amount of adaptation data
Mixture component clustering procedure based on a
likelihood measure
Number of regression classes is roughly
proportional to the number of adaptation data
Disadvantage
Needs to know the adaptation data in advance
Some regression classes might not have sufficient
amount of data
Poor estimates of the transformations
Class may be dominated by a specific mixture
component

11
Dynamic Regression Classes

Mixture components are arranged into a tree
Leaves of the tree are
For small HMM system individual mixture
component
For large HMM system base classes containing a
set of mixture components
These components are similar in divergence
measure
Leaves in a tree are then merged into groups of
similar components based on a distance measure
(divergence)

12
Supervised Adaptation vs. Unsupervised Adaptation
Note Fixed regression class approach was used
Figure Supervised vs. Unsupervised adaptation
using RM corpus
13
Evaluation on WSJ Data

Experiment settings
Dynamic regression classes approach
Baseline Speaker Independent system (refer to
5.1)
S3 test
Static supervised adaptation for non-native
speakers
S4 test
Incremental unsupervised adaptation for native
speakers

14
Regression Class Tree Settings

Distance measure
Divergence between mixture components
Use clustering algorithm to generate 750 base
classes
750 mixture components were chosen
Assign the nearest 10 to each base class
Assign the rest to the base classes by using an
average distance measure from all the existing
members
Regression tree was then built in a similar
distance measure
Base classes are compared in pair-wise basis
using an average divergence between all members
of each class

15
S3 Test Results
16
S4 Test Results
Note Increase update interval large reduction
in adaptation computation and only small drop in
performance
17
Number of classes vs. number of sentences (S4
Test)
18
Adaptation in Nov94 Hi-P0 HTK System