Regularized Adaptation for Discriminative Classifiers - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Regularized Adaptation for Discriminative Classifiers

Description:

Investigates links between a number discriminative classifiers ... Presented a general notion of 'regularized adaptation' for discriminative classifiers ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 17

Provided by: Emi251

Category:

more less

Transcript and Presenter's Notes

Title: Regularized Adaptation for Discriminative Classifiers

1
Regularized Adaptation for Discriminative
Classifiers

Xiao Li and Jeff Bilmes
University of Washington, Seattle

2
This work

Investigates links between a number
discriminative classifiers
Presents a general adaptation strategy
regularized adaptation

3
Adaptation for generative models

Target sample distribution is different from that
of training
Has long been studied in speech recognition for
generative models
Maximum likelihood linear regression
Maximum a posteriori
Eigenvoice

4
Discriminative classifiers

Discriminative classifiers
Directly model the conditional relation of a
label given features
Often yield more robust classification
performance than generative models
Popularly used
Support vector machines (SVM)
Multi-layer perceptrons (MLP)
Conditional maximum entropy models

5
Existing Discriminative Adaptation Strategies

SVMs
Combine SVs with selected adaptation data (Matic
93)
Combine selected SVs with adaptation data (Li 05)
MLPs
Linear input network (Neto 95, Abrash 97)
Retrain both layers from unadapted model (Neto
95)
Retrain part of last layer (Stadermann 05)
Retrain first layer
Conditional MaxEnt
Gaussian prior (Chelba 04)

6
SVMs and MLPs Links

Binary classification (xt yt)
Discriminant function
Accuracy-regularization objective

Nonlinear transform
SVM maximum margin MLP weight
decay MaxEnt Gaussian smoothing
7
SVMs and MLPs Differences
Nonlinear transform F? Typical loss func. Q Typical training
SVMs Reproducing kernel Hinge loss Quadratic prog.
MLPs Input-to-hidden layer Log loss Gradient descent
8
Adaptation

Adaptation data
May be in a small amount
May be unbalanced in classes
We intend to utilize
Unadapted model w0
Adaptation data (xt, yt), t1T

9
Regularized Adaptation

Generalized objective w.r.t. adapt data
Relations with existing SVM adapt. algs.
hinge loss (retrain SVM)
hard boosting (Matic 93)

Margin error
10
New Regularized Adaptation for SVMs

Soft boosting combine margin errors

adapt data
adapt data
11
Regularized Adaptation for SVMs (Cont.)

Theorem, for linear SVMs
In practice, we use a1

12
Reg. Adaptation for MLPs

Extend this to a two-layer MLP
Relations with existing MLP adapt. algs.
Linear input network µ ?8
Retrain from SI model µ0, ?0
Retrain last layer µ0, ??8
Retrain first layer µ?8, ?0
Regularized choose µ,? on a dev set
This also relates to MaxEnt adaptation using
Gaussian priors

13
Experiments Vowel Classification

Application the Vocal Joystick
A voice based computer interface for individuals
with motor impairments
Vowel quality ? angle
Data set (extended)
Train/dev/eval
21/4/10 speakers
6-fold cross-validation
MLP configuration
7 frames of MFCC deltas
50 hidden nodes
Frame-level classification error rate

14
Varying Adaptation Time
Err 4-class 4-class 4-class 8-class 8-class 8-class
SI 7.60 0.08 7.60 0.08 7.60 0.08 32.02 0.31 32.02 0.31 32.02 0.31
1s 2s 3s 1s 2s 3s
1.16 0.41 0.34 13.52 11.81 11.96
1.63 0.21 0.53 12.15 9.64 7.88
2.93 1.66 1.91 15.45 13.32 11.40
0.79 0.23 0.12 11.56 9.12 7.35
0.22 0.19 0.12 11.56 8.16 7.30
15
Varying vowels in adaptation (3s each)
SI 32
16
Varying vowels in adaptation (3s each)
SI 32
17
Varying vowels in adaptation (3s total)
SI 32
18
Varying vowels in adaptation (3s total)
SI 32
19
Summary

Drew links between discriminative classifiers
Presented a general notion of regularized
adaptation for discriminative classifiers
Natural adaptation strategies for SVMs and MLPs
justified using a maximum margin argument
A unified view of different adaptation algorithms
MLP experiments show superior performance
especially for class-skewed data

Write a Comment

User Comments (0)