More on Hidden Markov Models - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

More on Hidden Markov Models

Description:

(some s adapted from s byJason Eisner, Rada Mihalcea, Bonnie Dorr & Christof Monz) ... The Trellis. CIS 530 - Intro to NLP. 22. Forward Probabilities ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 37
Provided by: mitchel4
Category:

less

Transcript and Presenter's Notes

Title: More on Hidden Markov Models


1
More on Hidden Markov Models
2
The Noisy Channel Model
  • (some slides adapted from slides byJason Eisner,
    Rada Mihalcea, Bonnie Dorr Christof Monz)

3
Our Simple Markovian Tagger
4
The Hidden Markov Model Tagger
5
Parameters of an HMM
  • States A set of states Ss1,,sn
  • Transition probabilities A a1,1,a1,2,,an,n
    Each ai,j represents the probability of
    transitioning from state si to sj.
  • Emission probabilities A set B of functions of
    the form bi(ot) which is the probability of
    observation ot being emitted by si
  • Initial state distribution is the
    probability that si is a start state

6
Recognition using an HMM
7
Noisy Channel in a Picture

8
Noisy Channel Model
real language X
noisy channel X ? Y
yucky language Y
want to recover X from Y
9
Noisy Channel Model
real language X
correct spelling
typos
noisy channel X ? Y
yucky language Y
misspelling
want to recover X from Y
10
Noisy Channel Model
real language X
(lexicon space)
delete spaces
noisy channel X ? Y
yucky language Y
text w/o spaces
want to recover X from Y
11
Noisy Channel Model
real language X
(lexicon space)
pronunciation
noisy channel X ? Y
yucky language Y
speech
want to recover X from Y
12
Noisy Channel Model
real language X
English
English ?French
noisy channel X ? Y
yucky language Y
French
want to recover X from Y
13
Markovian Tagger HMM Noisy Channel!!
14
Review First Two Basic HMM Problems
  • Problem 1 (Evaluation) Given the observation
    sequence Oo1,,oT and an HMM model
    , how do we compute the probability of O
    given the model?
  • Problem 2 (Decoding) Given the observation
    sequence Oo1,,oT and an HMM model
  • , how do we find the
    state sequence that best explains the
    observations?

15
The Three Basic HMM Problems
  • Problem 3 (Learning) How do we adjust the model
    parameters , to maximize
    ?

16
Problem 3 Learning
  • Up to now weve assumed that we know the
    underlying model
  • Often these parameters are estimated on annotated
    training data, but
  • Annotation is often difficult and/or expensive
  • Training data is different from the current data
  • We want to maximize the parameters with respect
    to the current data, i.e., were looking for a
    model , such that

17
Problem 3 Learning
  • Unfortunately, there is no known way to
    analytically find a global maximum, i.e., a model
    , such that
  • But it is possible to find a local maximum
  • Given an initial model , we can always find a
    model , such that

18
Forward-Backward (Baum-Welch) algorithm
  • Key Idea parameter re-estimation by
    hill-climbing
  • From an arbitrary initial parameter instantiation
    , the FB algorithm iteratively re-estimates
    the parameters, improving the probability that a
    given observation was generated by

19
Parameter Re-estimation
  • Three parameters need to be re-estimated
  • Initial state distribution
  • Transition probabilities ai,j
  • Emission probabilities bi(ot)

20
Review
21
The Trellis
22
Forward Probabilities
  • What is the probability that, given an HMM ,
    at time t the state is i and the partial
    observation o1 ot has been generated?

23
Forward Probabilities
24
Forward Algorithm
  • Initialization
  • Induction
  • Termination

25
Forward Algorithm Complexity
  • Naïve approach takes O(2TNT) computation
  • Forward algorithm using dynamic programming takes
    O(N2T) computations

26
Backward Probabilities
  • What is the probability that given an HMM and
    given the state at time t is i, the partial
    observation ot1 oT is generated?
  • Analogous to forward probability, just in the
    other direction

27
Backward Probabilities

28
Backward Algorithm
  • Initialization
  • Induction
  • Termination

29
Re-estimating Transition Probabilities
  • Whats the probability of being in state si at
    time t and going to state sj, given the current
    model and parameters?

30
Re-estimating Transition Probabilities

31
Re-estimating Transition Probabilities
  • The intuition behind the re-estimation equation
    for transition probabilities is
  • Formally

32
Re-estimating Transition Probabilities
  • Defining
  • As the probability of being in state si, given
    the complete observation O
  • We can say

33
Re-estimating Initial State Probabilities
  • Initial state distribution is the
    probability that si is a start state
  • Re-estimation is easy
  • Formally

34
Re-estimation of Emission Probabilities
  • Emission probabilities are re-estimated as
  • Formally where
  • Note that here is the Kronecker delta
    function and is not related to the in the
    discussion of the Viterbi algorithm!!

35
The Updated Model
  • Coming from we get to
  • by the following
    update rules

36
Expectation Maximization
  • The forward-backward algorithm is an instance of
    the more general EM algorithm
  • The E Step Compute the forward and backward
    probabilities for a give model
  • The M Step Re-estimate the model parameters
Write a Comment
User Comments (0)
About PowerShow.com