TANDEM OBSERVATION MODELS - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

TANDEM OBSERVATION MODELS

Description:

Extensively used in the ICSI/SRI systems: 10-20 % improvement for English, Arabic, ... Judiciously selected dependencies between the factored vectors, instead of ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 11
Provided by: karenl5
Category:

less

Transcript and Presenter's Notes

Title: TANDEM OBSERVATION MODELS


1
TANDEM OBSERVATION MODELS
2
Introduction
  • Tandem is a method to use the predictions of a
    MLP as observation vectors in generative models,
    e..g. HMMs
  • Extensively used in the ICSI/SRI systems 10-20
    improvement for English, Arabic, and Mandarin
  • Most previous work used phone MLPs for deriving
    tandem (e.g., Hermansky et al. 00, and Morgan et
    al. 05 )
  • We explore tandem based on articulatory MLPs
  • Similar to the approach in Kirchhoff 99
  • Questions
  • Are articulatory tandems better than the phonetic
    ones?
  • Are factored observation models for tandem and
    acoustic (e.g. PLP) observations better than the
    observation concatenation approaches?

3
Tandem Processing Steps
  • MLP posteriors are processed to make them
    Gaussian like
  • There are 8 articulatory MLPs their outputs are
    joined together at the input (64 dims)
  • PCA reduces dimensionality to 26 (95 of the
    total variance)
  • Use this 26-dimensional vector as acoustic
    observations in an HMM or some other model
  • The tandem features are usually used in
    combination w/ a standard feature, e.g. PLP

4
Tandem Observation Models
  • Feature concatenation Simply append tandems to
    PLPs
  • All of the standard modeling methods applicable
    to this meta observation vector (e.g., MLLR,
    MMIE, and HLDA)
  • Factored models Tandem and PLP distributions are
    factored at the HMM state output distributions
  • - Potentially more efficient use of free
    parameters, especially if streams are
    conditionally independent
  • Can use e.g., separate triphone clusters for each
    observation

5
Articulatory vs. Phone Tandems
Model Test WER ()
PLP 67.7
PLP/Phone Tandem (SVBD) 63.0
PLP/Articulatory Tandem (SVBD) 62.3
PLP/Articulatory Tandem (Fisher) 59.7
  • Monophones on 500 vocabulary task w/o alignments
    feature concatenated PLP/tandem models
  • All tandem systems are significantly better than
    PLP alone
  • Articulatory tandems are as good as phone tandems
  • Articulatory tandems from Fisher (1776 hrs)
    trained MLPs outperform those from SVB (3 hrs)
    trained MLPs

6
Concatenation vs. Factoring
Model Task Test WER ()
PLP 10 24.5
PLP / Tandem Concatenation 10 21.1
PLP x Tandem Factoring 10 19.7
PLP 500 67.7
PLP / Tandem Concatenation 500 59.7
PLP x Tandem Factoring 500 59.1
  • Monophone models w/o alignments
  • All tandem results are significant over PLP
    baseline
  • Consistent improvements from factoring
    statistically significant on the 500 task

7
Triphone Experiments
Model of Clusters Test WER
PLP 477 59.2
PLP / Tandem Concatenation 880 55.0
PLP x Tandem Factoring 467x641 53.8
  • 500 vocabulary task w/o alignments
  • PLP x Tandem factoring uses separate decision
    trees for PLP and Tandem, as well as factored
    pdfs
  • A significant improvement from factoring over the
    feature concatenation approach
  • All pairs of results are statistically significant

8
Observation factoring and weight tuning
Factored tandem
Results
WER Validation set Test set
Factored 58.7 59.5
Fully-factored R R
Dimensions of streams
Stream before KLT after KLT
All MLPs 64 26
dg1 6 4
frt 7 5
glo 4 2
ht 8 5
nas 3 2
pl1 10 7
rou 3 2
vow 23 13
Fully factored tandem
phoneState
dg1
PLPs
pl1
rd
. . .
log outputs of separate MLPs
Dims after KLT account for 95 of variance
9
Weight tuning
  • Factored
  • Fully factored

MLP weight 1
Language model tuned for PLP weight1
Weight tuning in progress
10
Summary
  • Tandem features w/ PLPs outperform PLPs alone for
    both monophones and triphones
  • 8-13 relative improvements (statistically
    significant)
  • Articulatory tandems are as good as phone tandems
  • - Further comparisons w/ phone MLPs trained on
    Fisher
  • Factored models look promising (significant
    results on the 500 vocabulary task)
  • - Further experiments w/ tying, initialization
  • - Judiciously selected dependencies between the
    factored vectors, instead of complete
    independence
Write a Comment
User Comments (0)
About PowerShow.com