Title: Learning Structured Models for Phone Recognition
1Learning Structured Models for Phone Recognition
- Slav Petrov, Adam Pauls, Dan Klein
2Acoustic Modeling
3Motivation
- Standard acoustic models impose many structural
constraints - We propose an automatic approach
- Use TIMIT Dataset
- MFCC features
- Full covariance Gaussians
(Young and Woodland, 1994)
4Phone Classification
5Phone Classification
æ
6HMMs for Phone Classification
7HMMs for Phone Classification
Temporal Structure
8Standard subphone/mixture HMM
Temporal Structure
Gaussian Mixtures
Model Error rate
HMM Baseline 25.1
9Our Model
Standard Model
Fully Connected
Single Gaussians
10Hierarchical Baum-Welch Training
32.1
28.7
HMM Baseline 25.1
5 Split rounds 21.4
11Phone Classification Results
Method Error Rate
GMM Baseline (Sha and Saul, 2006) 26.0
HMM Baseline (Gunawardana et al., 2005) 25.1
SVM (Clarkson and Moreno, 1999) 22.4
Hidden CRF (Gunawardana et al., 2005) 21.7
Our Work 21.4
Large Margin GMM (Sha and Saul, 2006) 21.1
12Phone Recognition
13Standard State-Tied Acoustic Models
14No more State-Tying
15No more Gaussian Mixtures
16Fully connected internal structure
17Fully connected external structure
18Refinement of the /ih/-phone
19Refinement of the /ih/-phone
20Refinement of the /ih/-phone
21Refinement of the /ih/-phone
22Refinement of the /l/-phone
23Hierarchical Refinement Results
HMM Baseline 41.7
5 Split Rounds 28.4
24Merging
- Not all phones are equally complex
- Compute log likelihood loss from merging
Split model
Merged at one node
25Merging Criterion
26Split and Merge Results
Split Only 28.4
Split Merge 27.3
27HMM states per phone
28HMM states per phone
29HMM states per phone
30Alignment
Results
Hand Aligned 27.3
Auto Aligned 26.3
31Alignment State Distribution
32Inference
- State sequence
- d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5
- Phone sequence
- d - d - d -d -ae - ae - ae - ae - d - d -d - d -
d - Transcription
- d - ae -
d
Viterbi
Variational
???
33Variational Inference
Variational Approximation
Viterbi 26.3
Variational 25.1
34Phone Recognition Results
Method Error Rate
State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.7
Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1
Our Work 26.1
Bayesian Triphone HMM (Ming and Smith, 1998) 25.6
Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4
35Conclusions
- Minimalist, Automatic Approach
- Unconstrained
- Accurate
- Phone Classification
- Competitive with state-of-the-art discriminative
methods despite being generative - Phone Recognition
- Better than standard state-tied triphone models
36Thank you!
- http//nlp.cs.berkeley.edu