Automatic Speech Recognition Introduction - PowerPoint PPT Presentation

About This Presentation

Title:

Automatic Speech Recognition Introduction

Description:

Different types of tasks with different difficulties ... Perplexity (small 10/large 100) Signal-to-noise ratio (high 30 dB/low 10dB) ... – PowerPoint PPT presentation

Number of Views:1141

Avg rating:3.0/5.0

Slides: 38

Provided by: EricFosle1

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Speech Recognition Introduction

1
Automatic Speech RecognitionIntroduction

Readings Jurafsky Martin 7.1-2
HLT Survey Chapter 1

2
The Human Dialogue System
3
The Human Dialogue System
4
Computer Dialogue Systems
Dialogue Management
Audition
Automatic Speech Recognition
Natural Language Understanding
Natural Language Generation
Text-to- speech
Planning
signal
words
words
signal
signal
logical form
5
Computer Dialogue Systems
Dialogue Mgmt.
Audition
ASR
NLU
NLG
Text-to- speech
Planning
signal
words
words
signal
signal
logical form
6
Parameters of ASR Capabilities

Different types of tasks with different
difficulties
Speaking mode (isolated words/continuous speech)
Speaking style (read/spontaneous)
Enrollment (speaker-independent/dependent)
Vocabulary (small lt 20 wd/large gt20kword)
Language model (finite state/context sensitive)
Perplexity (small lt 10/large gt100)
Signal-to-noise ratio (high gt 30 dB/low lt 10dB)
Transducer (high quality microphone/telephone)

7
The Noisy Channel Model
message
message
noisy channel

Message
Channel
Signal
Decoding model find Message argmax
P(MessageSignal) But how do we represent each of
these things?
8
ASR using HMMs

Try to solve P(MessageSignal) by breaking the
problem up into separate components
Most common method Hidden Markov Models
Assume that a message is composed of words
Assume that words are composed of sub-word parts
(phones)
Assume that phones have some sort of acoustic
realization
Use probabilistic models for matching acoustics
to phones to words

9
HMMs The Traditional View
go
home
Markov model backbone composed of phones (hidden
because we dont know correspondences)
g
o
h
o
m
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
Acoustic observations
Each line represents a probability estimate (more
later)
10
HMMs The Traditional View
go
home
Markov model backbone composed of phones (hidden
because we dont know correspondences)
g
o
h
o
m
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
Acoustic observations
Even with same word hypothesis, can have
different alignments. Also, have to search over
all word hypotheses
11
HMMs as Dynamic Bayesian Networks
Markov model backbone composed of phones
go
home
q0g
q1o
q2o
q3o
q4h
q5o
q6o
q7o
q8m
q9m
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
Acoustic observations
12
HMMs as Dynamic Bayesian Networks
Markov model backbone composed of phones
go
home
q0g
q1o
q2o
q3o
q4h
q5o
q6o
q7o
q8m
q9m
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
ASR What is best assignment to q0q9 given x0x9?
13
Hidden Markov Models DBNs
DBN representation
Markov Model representation
14
Parts of an ASR System
Feature Calculation
Language Modeling
Acoustic Modeling
cat dog 0.00002 cat the 0.0000005 the cat
0.029 the dog 0.031 the mail 0.054
k
_at_
S E A R C H
The cat chased the dog
15
Parts of an ASR System
Feature Calculation
Language Modeling
Acoustic Modeling
cat dog 0.00002 cat the 0.0000005 the cat
0.029 the dog 0.031 the mail 0.054
k
_at_
Maps acoustics to phones
Maps phones to words
Strings words together
Produces acoustics (xt)
16
Feature calculation
17
Feature calculation
Frequency
Time
Find energy at each time step in each frequency
channel
18
Feature calculation
Frequency
Time
Take inverse Discrete Fourier Transform to
decorrelate frequencies
19
Feature calculation
Input
-0.1 0.3 1.4 -1.2 2.3 2.6
0.2 0.1 1.2 -1.2 4.4 2.2
-6.1 -2.1 3.1 2.4 1.0 2.2
0.2 0.0 1.2 -1.2 4.4 2.2
Output

20
Robust Speech Recognition

Different schemes have been developed for dealing
with noise, reverberation
Additive noise reduce effects of particular
frequencies
Convolutional noise remove effects of linear
filters (cepstral mean subtraction)

21
Now what?
-0.1 0.3 1.4 -1.2 2.3 2.6
0.2 0.1 1.2 -1.2 4.4 2.2
-6.1 -2.1 3.1 2.4 1.0 2.2
0.2 0.0 1.2 -1.2 4.4 2.2
???
That you
22
Machine Learning!
-0.1 0.3 1.4 -1.2 2.3 2.6
0.2 0.1 1.2 -1.2 4.4 2.2
-6.1 -2.1 3.1 2.4 1.0 2.2
0.2 0.0 1.2 -1.2 4.4 2.2
Pattern recognition
That you
with HMMs
23
Hidden Markov Models (again!)
P(statet1statet) Pronunciation/Language models
P(acousticststatet) Acoustic Model
24
Acoustic Model

Assume that you can label each vector with a
phonetic label
Collect all of the examples of a phone together
and build a Gaussian model (or some other
statistical model, e.g. neural networks)

Na(m,S) P(Xstatea)
25
Building up the Markov Model

Start with a model for each phone
Typically, we use 3 states per phone to give a
minimum duration constraint, but ignore that here

transition probability
26
Building up the Markov Model

Pronunciation model gives connections between
phones and words
Multiple pronunciations

ow
ey
t
m
ah
ah
27
Building up the Markov Model

Language model gives connections between words
(e.g., bigram grammar)

p(hethat)
t
p(youthat)
28
ASR as Bayesian Inference
q1w1
q2w1
q3w1
p(hethat)
t
p(youthat)
x1
x2
x3
iy
argmaxW P(WX) argmaxW P(XW)P(W)/P(X) argmaxW
P(XW)P(W) argmaxW SQ P(X,QW)P(W) argmaxW maxQ
P(X,QW)P(W) argmaxW maxQ P(XQ) P(QW) P(W)
d
29
ASR Probability Models

Three probability models
P(XQ) acoustic model
P(QW) duration/transition/pronunciation model
P(W) language model
language/pronunciation models inferred from prior
knowledge
Other models learned from data (how?)

30
Parts of an ASR System
P(XQ)
P(QW)
P(W)
Feature Calculation
Language Modeling
Acoustic Modeling
cat dog 0.00002 cat the 0.0000005 the cat
0.029 the dog 0.031 the mail 0.054
k
_at_
S E A R C H
The cat chased the dog
31
EM for ASR The Forward-Backward Algorithm

Determine state occupancy probabilities
I.e. assign each data vector to a state
Calculate new transition probabilities, new means
standard deviations (emission probabilities)
using assignments

32
ASR as Bayesian Inference
q1w1
q2w1
q3w1
p(hethat)
t
p(youthat)
x1
x2
x3
iy
argmaxW P(WX) argmaxW P(XW)P(W)/P(X) argmaxW
P(XW)P(W) argmaxW SQ P(X,QW)P(W) argmaxW maxQ
P(X,QW)P(W) argmaxW maxQ P(XQ) P(QW) P(W)
d
33
Search