Visible Markov Models

About This Presentation

Title:

Description:

Number of Views:20

Avg rating:3.0/5.0

Slides: 11

Provided by: johnmc5

Category:

Tags: markov | models | visible

Transcript and Presenter's Notes

Title: Visible Markov Models

1
Hidden Markov Models (HMMs)

2
Visible Markov Models

3
VMM Weather Example
4
Generative HMM

5
2 Questions

Training Problem
Given an observation sequence O and a space of
possible models which spans possible values for
model parameters w A, ?, how do we find the
model that best explains the observed data?
Recognition (decoding) problem
Given a model wi A, ?, how do we compute how
likely a certain observation is, i.e. P(O wi) ?

6
Training VMMs

Given observation sequences Os, we want to find
model parameters w A, ? which best explain
the observations
I.e. we want to find values for w A, ? that
maximises P(O w)
A, ? chosen argmax A, ? P(O A, ?)

7
Training VMMs

Straightforward for VMMs
frequency in state i at time t 1
(number of transitions from state i to state j)
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
-------------------
(number of transitions from state i)
(number of transitions from state i to state j)
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
-------------------
(number of times in state i)

8
Recognition

We need to calculate P(O wi)
P(O wi) is handy for calculating P(wiO)
If we have a set of models L w1,w2,,wV then
if we can calculate P(wiO) we can choose the
model which returns the highest probability, i.e.
wchosen argmax wi ? L P(wiO)

9
Recognition

Why is P(O wi) of use?
Lets revisit speech for a moment.
In speech we are given a sequence of
observations, e.g. a series of LPC vectors
E.g. LP coefficients taken from frames of length
20-40ms, every 10-20 ms
If we have a set of models L w1,w2,,wV then
if we can calculate P(wiO) we can choose the
model which returns the highest probability, i.e.
wchosen argmax wi ? L P(wiO)

10
wchosen argmax wi ? L P(wiO)

P(wiO) difficult to calculate as we would have
to have a model for every possible observation
sequence O
Use Bayes rule
P(x y) P (y x) P(x) / P(y)
So now we have
wchosen argmax wi ? L P(O wi) P(wi) / P(O)
P(wi) can be easily calculated
P(O) is the same for each calculation and so can
be ignored
So P(O wi) is the key!!!