Philip Jackson and Martin Russell - PowerPoint PPT Presentation

About This Presentation
Title:

Philip Jackson and Martin Russell

Description:

Models of speech dynamics in a segmental-HMM recognizer using intermediate ... linguistic categories: silence/stop; vowel; liquid; nasal; fricative; affricate. 6 ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 23
Provided by: philipj5
Category:

less

Transcript and Presenter's Notes

Title: Philip Jackson and Martin Russell


1
Models of speech dynamics in a segmental-HMM
recognizer using intermediate linear
representations
  • Philip Jackson and Martin Russell

Electronic Electrical and Computer Engineering
http//web.bham.ac.uk/p.jackson/balthasar/
2
Speech dynamics into ASR
INTRODUCTION
3
Conventional model
  • acoustic observations

acoustic PDF
1
1
1
1
1
2
3
4
2
2
2
2
2
3
3
3
4
4
4
2
HMM
INTRODUCTION
4
Linear-trajectory model
  • acoustic observations

acoustic PDF
articulatory-to-
W
acoustic mapping
intermediate layer
2
3
4
1
segmental HMM
INTRODUCTION
5
Multi-level Segmental HMM
  • segmental finite-state process
  • intermediate articulatory layer
  • linear trajectories
  • mapping required
  • linear transformation
  • radial basis function network

INTRODUCTION
6
Estimation of linear mapping
  • Matched sequences and

THEORY
7
Linear-trajectory equations
  • Defined as

THEORY
8
Training the model parameters
  • For optimal least-squares estimates (acoustic
    domain)

midpoint
slope
THEORY
9
Training the model parameters
  • For optimal least-squares estimates
    (articulatory domain)

midpoint
slope
THEORY
10
Training the model parameters
  • For optimal maximum-likelihood estimates
    (articulatory domain)

midpoint
slope
THEORY
11
Tests on MOCHA
  • S. British English, at 16kHz (Wrench, 2000)
  • MFCC13 acoustic features, incl. zeroth
  • articulatory x- y-coords from 7 EMA coils
  • PCA9Lx first nine articulatory modes plus the
    laryngograph log energy

METHOD
12
MOCHA baseline performance
  • Constant-trajectory SHMM (ID_0)
  • Linear-trajectory SHMM (ID_1)

RESULTS
13
Performance across mappings
RESULTS
14
Phone categorisation
No. Description
A 1 all data
B 2 silence speech
C 6 linguistic categories silence/stop vowel liquid nasal fricative affricate
D 10 as (Deng and Ma, 2000) silence vowel liquid nasal UV fric /s,ch/ V fric /z,jh/ UV stop V stop
E 10 discrete articulatory regions
F 49 silence individual phones
METHOD
15
Tests on TIMIT
  • N. American English, at 8kHz
  • MFCC13 acoustic features, incl. zeroth
  • F1-3 formants F1, F2 and F3, estimated by Holmes
    formant tracker
  • F1-3BE5 five band energies added
  • PFS12 synthesiser control parameters

METHOD
16
TIMIT baseline performance
  • Constant-trajectory SHMM (ID_0)
  • Linear-trajectory SHMM (ID_1)

RESULTS
17
Performance across feature sets
RESULTS
18
Performance across groupings
RESULTS
19
Results across groupings
RESULTS
20
Model visualisation
  • Original
  • acoustic
  • data

Constant- trajectory model
Linear- trajectory model (c,F)
DISCUSSION
21
Conclusions
  • Developed framework for speech dynamics in an
    intermediate space
  • Linear traj. piecewise linear mapping bounded
    by performance of linear traj. in acoustic space
  • Near optimal performance achieved
  • For more than 3 formant parameters
  • For 6 or more linear mappings
  • Formants and articulatory parameters gave
    qualitatively similar results
  • What next?

SUMMARY
22
Further work
  • Complete experiments with lang. model
  • Include segment duration models
  • Derive pseudo-articulatory representations by
    unsupervised (embedded) training
  • Implement non-linear mapping (i.e., RBF)
  • Further information
  • here and now
  • p.jackson_at_bham.ac.uk
  • web.bham.ac.uk/p.jackson/balthasar

SUMMARY
Write a Comment
User Comments (0)
About PowerShow.com