More on Hidden Markov Models - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

More on Hidden Markov Models

Description:

(some s adapted from s byJason Eisner, Rada Mihalcea, Bonnie Dorr & Christof Monz) ... The Trellis. CIS 530 - Intro to NLP. 22. Forward Probabilities ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 37

Provided by: mitchel4

Category:

more less

Transcript and Presenter's Notes

Title: More on Hidden Markov Models

1
More on Hidden Markov Models
2
The Noisy Channel Model

(some slides adapted from slides byJason Eisner,
Rada Mihalcea, Bonnie Dorr Christof Monz)

3
Our Simple Markovian Tagger
4
The Hidden Markov Model Tagger
5
Parameters of an HMM

States A set of states Ss1,,sn
Transition probabilities A a1,1,a1,2,,an,n
Each ai,j represents the probability of
transitioning from state si to sj.
Emission probabilities A set B of functions of
the form bi(ot) which is the probability of
observation ot being emitted by si
Initial state distribution is the
probability that si is a start state

6
Recognition using an HMM
7
Noisy Channel in a Picture

8
Noisy Channel Model
real language X
noisy channel X ? Y
yucky language Y
want to recover X from Y
9
Noisy Channel Model
real language X
correct spelling
typos
noisy channel X ? Y
yucky language Y
misspelling
want to recover X from Y
10
Noisy Channel Model
real language X
(lexicon space)
delete spaces
noisy channel X ? Y
yucky language Y
text w/o spaces
want to recover X from Y
11
Noisy Channel Model
real language X
(lexicon space)
pronunciation
noisy channel X ? Y
yucky language Y
speech
want to recover X from Y
12
Noisy Channel Model
real language X
English
English ?French
noisy channel X ? Y
yucky language Y
French
want to recover X from Y
13
Markovian Tagger HMM Noisy Channel!!
14
Review First Two Basic HMM Problems

Problem 1 (Evaluation) Given the observation
sequence Oo1,,oT and an HMM model
, how do we compute the probability of O
given the model?
Problem 2 (Decoding) Given the observation
sequence Oo1,,oT and an HMM model
, how do we find the
state sequence that best explains the
observations?

15
The Three Basic HMM Problems

Problem 3 (Learning) How do we adjust the model
parameters , to maximize
?

16
Problem 3 Learning

Up to now weve assumed that we know the
underlying model
Often these parameters are estimated on annotated
training data, but
Annotation is often difficult and/or expensive
Training data is different from the current data
We want to maximize the parameters with respect
to the current data, i.e., were looking for a
model , such that

17
Problem 3 Learning

Unfortunately, there is no known way to
analytically find a global maximum, i.e., a model
, such that
But it is possible to find a local maximum
Given an initial model , we can always find a
model , such that

18
Forward-Backward (Baum-Welch) algorithm

Key Idea parameter re-estimation by
hill-climbing
From an arbitrary initial parameter instantiation
, the FB algorithm iteratively re-estimates
the parameters, improving the probability that a
given observation was generated by

19
Parameter Re-estimation

Three parameters need to be re-estimated
Initial state distribution
Transition probabilities ai,j
Emission probabilities bi(ot)

20
Review
21
The Trellis
22
Forward Probabilities

What is the probability that, given an HMM ,
at time t the state is i and the partial
observation o1 ot has been generated?

23
Forward Probabilities
24
Forward Algorithm

Initialization
Induction
Termination

25
Forward Algorithm Complexity

Naïve approach takes O(2TNT) computation
Forward algorithm using dynamic programming takes
O(N2T) computations

26
Backward Probabilities

What is the probability that given an HMM and
given the state at time t is i, the partial
observation ot1 oT is generated?
Analogous to forward probability, just in the
other direction

27
Backward Probabilities

28
Backward Algorithm

Initialization
Induction
Termination

29
Re-estimating Transition Probabilities

Whats the probability of being in state si at
time t and going to state sj, given the current
model and parameters?

30
Re-estimating Transition Probabilities

31
Re-estimating Transition Probabilities

The intuition behind the re-estimation equation
for transition probabilities is
Formally

32
Re-estimating Transition Probabilities

Defining
As the probability of being in state si, given
the complete observation O
We can say

33
Re-estimating Initial State Probabilities

Initial state distribution is the
probability that si is a start state
Re-estimation is easy
Formally

34
Re-estimation of Emission Probabilities

Emission probabilities are re-estimated as
Formally where
Note that here is the Kronecker delta
function and is not related to the in the
discussion of the Viterbi algorithm!!

35
The Updated Model