Learning Structured Models for Phone Recognition

About This Presentation

Title:

Learning Structured Models for Phone Recognition

Description:

Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 37

Provided by: EEC60

Learn more at: http://nlp.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning Structured Models for Phone Recognition

1
Learning Structured Models for Phone Recognition

Slav Petrov, Adam Pauls, Dan Klein

2
Acoustic Modeling
3
Motivation

Standard acoustic models impose many structural
constraints
We propose an automatic approach
Use TIMIT Dataset
MFCC features
Full covariance Gaussians

(Young and Woodland, 1994)
4
Phone Classification
5
Phone Classification
æ
6
HMMs for Phone Classification
7
HMMs for Phone Classification
Temporal Structure
8
Standard subphone/mixture HMM
Temporal Structure
Gaussian Mixtures
Model Error rate
HMM Baseline 25.1
9
Our Model
Standard Model
Fully Connected
Single Gaussians
10
Hierarchical Baum-Welch Training
32.1
28.7
HMM Baseline 25.1
5 Split rounds 21.4
11
Phone Classification Results
Method Error Rate
GMM Baseline (Sha and Saul, 2006) 26.0
HMM Baseline (Gunawardana et al., 2005) 25.1
SVM (Clarkson and Moreno, 1999) 22.4
Hidden CRF (Gunawardana et al., 2005) 21.7
Our Work 21.4
Large Margin GMM (Sha and Saul, 2006) 21.1
12
Phone Recognition
13
Standard State-Tied Acoustic Models
14
No more State-Tying
15
No more Gaussian Mixtures
16
Fully connected internal structure
17
Fully connected external structure
18
Refinement of the /ih/-phone
19
Refinement of the /ih/-phone
20
Refinement of the /ih/-phone
21
Refinement of the /ih/-phone
22
Refinement of the /l/-phone
23
Hierarchical Refinement Results
HMM Baseline 41.7
5 Split Rounds 28.4
24
Merging

Not all phones are equally complex
Compute log likelihood loss from merging

Split model
Merged at one node
25
Merging Criterion
26
Split and Merge Results
Split Only 28.4
Split Merge 27.3
27
HMM states per phone
28
HMM states per phone
29
HMM states per phone
30
Alignment
Results
Hand Aligned 27.3
Auto Aligned 26.3
31
Alignment State Distribution
32
Inference

State sequence
d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5
Phone sequence
d - d - d -d -ae - ae - ae - ae - d - d -d - d -
d
Transcription
d - ae -
d

Viterbi
Variational
???
33
Variational Inference
Variational Approximation
Viterbi 26.3
Variational 25.1
34
Phone Recognition Results
Method Error Rate
State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.7
Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1
Our Work 26.1
Bayesian Triphone HMM (Ming and Smith, 1998) 25.6
Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4
35
Conclusions

Minimalist, Automatic Approach
Unconstrained
Accurate
Phone Classification
Competitive with state-of-the-art discriminative
methods despite being generative
Phone Recognition
Better than standard state-tied triphone models