Title: More on Hidden Markov Models
1More on Hidden Markov Models
2The Noisy Channel Model
- (some slides adapted from slides byJason Eisner,
Rada Mihalcea, Bonnie Dorr Christof Monz)
3Our Simple Markovian Tagger
4The Hidden Markov Model Tagger
5Parameters of an HMM
- States A set of states Ss1,,sn
- Transition probabilities A a1,1,a1,2,,an,n
Each ai,j represents the probability of
transitioning from state si to sj. - Emission probabilities A set B of functions of
the form bi(ot) which is the probability of
observation ot being emitted by si - Initial state distribution is the
probability that si is a start state
6Recognition using an HMM
7Noisy Channel in a Picture
8Noisy Channel Model
real language X
noisy channel X ? Y
yucky language Y
want to recover X from Y
9Noisy Channel Model
real language X
correct spelling
typos
noisy channel X ? Y
yucky language Y
misspelling
want to recover X from Y
10Noisy Channel Model
real language X
(lexicon space)
delete spaces
noisy channel X ? Y
yucky language Y
text w/o spaces
want to recover X from Y
11Noisy Channel Model
real language X
(lexicon space)
pronunciation
noisy channel X ? Y
yucky language Y
speech
want to recover X from Y
12Noisy Channel Model
real language X
English
English ?French
noisy channel X ? Y
yucky language Y
French
want to recover X from Y
13Markovian Tagger HMM Noisy Channel!!
14Review First Two Basic HMM Problems
- Problem 1 (Evaluation) Given the observation
sequence Oo1,,oT and an HMM model
, how do we compute the probability of O
given the model? - Problem 2 (Decoding) Given the observation
sequence Oo1,,oT and an HMM model - , how do we find the
state sequence that best explains the
observations?
15The Three Basic HMM Problems
- Problem 3 (Learning) How do we adjust the model
parameters , to maximize
?
16Problem 3 Learning
- Up to now weve assumed that we know the
underlying model - Often these parameters are estimated on annotated
training data, but - Annotation is often difficult and/or expensive
- Training data is different from the current data
- We want to maximize the parameters with respect
to the current data, i.e., were looking for a
model , such that
17Problem 3 Learning
- Unfortunately, there is no known way to
analytically find a global maximum, i.e., a model
, such that - But it is possible to find a local maximum
- Given an initial model , we can always find a
model , such that
18Forward-Backward (Baum-Welch) algorithm
- Key Idea parameter re-estimation by
hill-climbing - From an arbitrary initial parameter instantiation
, the FB algorithm iteratively re-estimates
the parameters, improving the probability that a
given observation was generated by
19Parameter Re-estimation
- Three parameters need to be re-estimated
- Initial state distribution
- Transition probabilities ai,j
- Emission probabilities bi(ot)
20Review
21The Trellis
22Forward Probabilities
- What is the probability that, given an HMM ,
at time t the state is i and the partial
observation o1 ot has been generated?
23Forward Probabilities
24Forward Algorithm
- Initialization
- Induction
- Termination
25Forward Algorithm Complexity
- Naïve approach takes O(2TNT) computation
- Forward algorithm using dynamic programming takes
O(N2T) computations
26Backward Probabilities
- What is the probability that given an HMM and
given the state at time t is i, the partial
observation ot1 oT is generated? - Analogous to forward probability, just in the
other direction
27Backward Probabilities
28Backward Algorithm
- Initialization
- Induction
- Termination
29Re-estimating Transition Probabilities
- Whats the probability of being in state si at
time t and going to state sj, given the current
model and parameters?
30Re-estimating Transition Probabilities
31Re-estimating Transition Probabilities
- The intuition behind the re-estimation equation
for transition probabilities is - Formally
32Re-estimating Transition Probabilities
- Defining
- As the probability of being in state si, given
the complete observation O - We can say
33Re-estimating Initial State Probabilities
- Initial state distribution is the
probability that si is a start state - Re-estimation is easy
- Formally
34Re-estimation of Emission Probabilities
- Emission probabilities are re-estimated as
- Formally where
- Note that here is the Kronecker delta
function and is not related to the in the
discussion of the Viterbi algorithm!!
35The Updated Model
- Coming from we get to
- by the following
update rules
36Expectation Maximization
- The forward-backward algorithm is an instance of
the more general EM algorithm - The E Step Compute the forward and backward
probabilities for a give model - The M Step Re-estimate the model parameters