Title: Regularized Adaptation for Discriminative Classifiers
1Regularized Adaptation for Discriminative
Classifiers
- Xiao Li and Jeff Bilmes
- University of Washington, Seattle
2This work
- Investigates links between a number
discriminative classifiers - Presents a general adaptation strategy
regularized adaptation
3Adaptation for generative models
- Target sample distribution is different from that
of training - Has long been studied in speech recognition for
generative models - Maximum likelihood linear regression
- Maximum a posteriori
- Eigenvoice
4Discriminative classifiers
- Discriminative classifiers
- Directly model the conditional relation of a
label given features - Often yield more robust classification
performance than generative models - Popularly used
- Support vector machines (SVM)
- Multi-layer perceptrons (MLP)
- Conditional maximum entropy models
5Existing Discriminative Adaptation Strategies
- SVMs
- Combine SVs with selected adaptation data (Matic
93) - Combine selected SVs with adaptation data (Li 05)
- MLPs
- Linear input network (Neto 95, Abrash 97)
- Retrain both layers from unadapted model (Neto
95) - Retrain part of last layer (Stadermann 05)
- Retrain first layer
- Conditional MaxEnt
- Gaussian prior (Chelba 04)
6SVMs and MLPs Links
- Binary classification (xt yt)
- Discriminant function
- Accuracy-regularization objective
Nonlinear transform
SVM maximum margin MLP weight
decay MaxEnt Gaussian smoothing
7SVMs and MLPs Differences
Nonlinear transform F? Typical loss func. Q Typical training
SVMs Reproducing kernel Hinge loss Quadratic prog.
MLPs Input-to-hidden layer Log loss Gradient descent
8Adaptation
- Adaptation data
- May be in a small amount
- May be unbalanced in classes
- We intend to utilize
- Unadapted model w0
- Adaptation data (xt, yt), t1T
9Regularized Adaptation
- Generalized objective w.r.t. adapt data
- Relations with existing SVM adapt. algs.
- hinge loss (retrain SVM)
- hard boosting (Matic 93)
Margin error
10New Regularized Adaptation for SVMs
- Soft boosting combine margin errors
adapt data
adapt data
11Regularized Adaptation for SVMs (Cont.)
- Theorem, for linear SVMs
-
-
-
- In practice, we use a1
12Reg. Adaptation for MLPs
- Extend this to a two-layer MLP
- Relations with existing MLP adapt. algs.
- Linear input network µ ?8
- Retrain from SI model µ0, ?0
- Retrain last layer µ0, ??8
- Retrain first layer µ?8, ?0
- Regularized choose µ,? on a dev set
- This also relates to MaxEnt adaptation using
Gaussian priors
13Experiments Vowel Classification
- Application the Vocal Joystick
- A voice based computer interface for individuals
with motor impairments - Vowel quality ? angle
- Data set (extended)
- Train/dev/eval
- 21/4/10 speakers
- 6-fold cross-validation
- MLP configuration
- 7 frames of MFCC deltas
- 50 hidden nodes
- Frame-level classification error rate
14Varying Adaptation Time
Err 4-class 4-class 4-class 8-class 8-class 8-class
SI 7.60 0.08 7.60 0.08 7.60 0.08 32.02 0.31 32.02 0.31 32.02 0.31
1s 2s 3s 1s 2s 3s
1.16 0.41 0.34 13.52 11.81 11.96
1.63 0.21 0.53 12.15 9.64 7.88
2.93 1.66 1.91 15.45 13.32 11.40
0.79 0.23 0.12 11.56 9.12 7.35
0.22 0.19 0.12 11.56 8.16 7.30
15Varying vowels in adaptation (3s each)
SI 32
16Varying vowels in adaptation (3s each)
SI 32
17Varying vowels in adaptation (3s total)
SI 32
18Varying vowels in adaptation (3s total)
SI 32
19Summary
- Drew links between discriminative classifiers
- Presented a general notion of regularized
adaptation for discriminative classifiers - Natural adaptation strategies for SVMs and MLPs
justified using a maximum margin argument - A unified view of different adaptation algorithms
- MLP experiments show superior performance
especially for class-skewed data