EM algorithm and applications - PowerPoint PPT Presentation

About This Presentation

Title:

EM algorithm and applications

Description:

The coin is tossed twice, but only the 1st outcome, T, is seen. So the data is x = (T, ... However, testing individuals for their genotype is a very expensive. ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 34

Provided by: hat89

Learn more at: http://darwin.informatics.indiana.edu

Category:

more less

Transcript and Presenter's Notes

Title: EM algorithm and applications

1
EM algorithm and applications
2
Relative Entropy

Let p,q be two probability distributions on the
same sample space. The relative entropy between p
and q is defined by
H(pq) ?x p(x)logp(x)/q(x)
?x p(x)log(1/(q(x)) -
-? x p(x)log(1/(p(x)).

The inefficiency of assuming distribution q when
the correct distribution is p.
3
EM algorithm approximating MLE from Incomplete
Data

Finding MLE parameters nonlinear optimization
problem

log P(x ?)
E ?log P(x,y?)
?
?
?
General idea of EM Use current point ? to
construct alternative function Q? (which is
nice) if Q?(?)gtQ?(?), than ? has higher
likelihood than ?.
4
The EM algorithm

Consider a model where, for observed data x and
model parameters ?
p(x?)?yp(x,y?).
(y are hidden data).
The EM algorithm receives x and parameters ?, and
returns new parameters ?, s.t. p(x?) gt
p(x?).

5
The EM algorithm

Finding ? which maximizes
p(x?)?yp(x,y?).
is equivalent to finding ? which maximizes the
logarithm
log p(x?) log (?yp(x,y ?))
Which is what the EM algorithm attempts to do.

6
The EM algorithm

In each iteration the EM algorithm does the
following.
(E step) Calculate
Q? (?) ?y p(yx,?)log p(x,y?)
(M step) Find ? which maximizes Q? (?)
(Next iteration sets ? ?? and repeats).

7
Example Baum-Welsh EM for HMM

The Baum-Welsh algorithm is the EM algorithm for
HMM
E step for HMM Q? (?) ?S p(sx,?)log p(s,x?),
where ? are the new parameters akl,ek(b).

(The are the counts of state
transitions and symbol emissions in (s,x)).
8
Baum Welsh EM for HMM

M step For HMM Find ? which maximizes Q? (?).
As we proved, ? is given by the relative
frequencies of the Akls and the Ek(b)s

9
A simplest example EM for 2 coin tosses

Given a coin with two possible outcomes H (head)
and T (tail), with probabilities qH, qT 1- qH.
The coin is tossed twice, but only the 1st
outcome, T, is seen. So the data is x (T,).
We wish to apply the EM algorithm to get
parameters that increase the likelihood of the
data.
Let the initial parameters be ? (qH, qT) (
¼, ¾ ).

10
EM for 2 coin tosses
The hidden data which can produce x are the
sequences y1 (T,H) y2(T,T) Hence the
likelihood of x with parameters (qH, qT), is p(x
?) P(x,y1 ?) P(x,y2 ?) qHqTqT2 For the
initial parameters ? ( ¼, ¾ ), we have p(x ?)
¾ ¼ ¾ ¾ ¾

Note that in this case P(x,yi ?) P(yi ?), for
i 1,2.
we can always define y so that (x,y) y
(otherwise we set y ?(x,y) and replace the y
s by y s).

11
EM for 2 coin tosses - E step

Calculate Q? (?) Q?(qH,qT).
Q? (?) p(y1x,?)log p(x,y1?)
p(y2x,?)log p(x,y2?)

p(y1x,?) p(y1,x?)/p(x?) (¾ ¼)/ (¾)
¼
p(y2x,?) p(y2,x?)/p(x?) (¾ ¾)/ (¾)
¾
Thus we have
Q? (?) ¼ log p(x,y1?) ¾ log p(x,y2?)

12
EM for 2 coin tosses - E step

For a sequence y of coin tosses, let NH(y) be the
number of Hs in y, and NT(y) be the number of
Ts in y. Then
log p(y?) NH(y) log qH NT(y) log qT

In our example
y1 (T,H) y2(T,T), hence
NH(y1) NT(y1)1, NH(y2) 0, NT(y2)2

13
Example 2 coin tosses - E step

Thus
¼ log p(x,y1?) ¼ (NH(y1) log qH NT(y1) log
qT) ¼ (log qH log qT)
¾ log p(x,y2?) ¾ ( NH(y2) log qH NT(y2) log
qT) ¾ (2 log qT)
Substituting in the equation for Q? (?)
Q? (?) ¼ log p(x,y1?) ¾ log p(x,y2?)
( ¼ NH(y1) ¾ NH(y2))log qH ( ¼ NT(y1) ¾
NT(y2))log qT

NT 7/4
NH ¼
Q? (?) NHlog qH NTlog qT
14
EM for 2 coin tosses - M step
Find ? which maximizes Q? (?) Q? (?) NHlog qH
NTlog qT ¼ log qH 7/4 log qT We saw earlier
that this is maximized when
The optimal parameters (0,1), will never be
reached by the EM algorithm!
15
EM for single random variable (dice)
Now, the probability of each y ((x,y)) is given
by a sequence of dice tosses. The dice has m
outcomes, with probabilities q1,..,qm. Let Nl(y)
(outcome l occurs in y). Then
Let Nl be the expected value of Nl(y), given x
and ? NlE(Nlx,?) ?y p(yx,?) Nl(y),
16
Q? (?) for one dice
17
EM algorithm for n independent observations x1,,
xn
Expectation step It can be shown that, if the xj
are independent, then
18
Example The ABO locus
A locus is a particular place on the chromosome.
Each locus state (called genotype) consists of
two alleles one parental and one maternal.
Some loci (plural of locus) determine
distinguished features. The ABO locus, for
example, determines blood type.
Suppose we randomly sampled N individuals and
found that Na/a have genotype a/a, Na/b have
genotype a/b, etc. Then, the MLE is given by
19
The ABO locus
However, testing individuals for their genotype
is a very expensive. Can we estimate the
proportions of genotype using the common cheap
blood test with outcome being one of the four
blood types (A, B, AB, O) ?
The problem is that among individuals measured to
have blood type A, we dont know how many have
genotype a/a and how many have genotype a/o. So
what can we do ?
20
The ABO locus
The Hardy-Weinberg equilibrium rule states that
in equilibrium the frequencies of the three
alleles qa,qb,qo in the population determine the
frequencies of the genotypes as follows qa/b
2qa qb, qa/o 2qa qo, qb/o 2qb qo, qa/a
qa2, qb/b qb2, qo/o qo2. In fact,
Hardy-Weinberg equilibrium rule follows from
modeling this problem as data x with hidden
parameters y.
21
The ABO locus
The dice outcome are the three possible alleles
a, b and o. The observed data are the blood types
A, B, AB or O. Each blood type is determined by
two successive random sampling of alleles, which
is an ordered genotypes pair this is the
hidden data. For instance blood type A
corresponds to the ordered genotypes pairs (a,a),
(a,o) and (o,a). So we have three parameters of
one dice qa,qb,qo - that we need to estimate.
22
EM setting for the ABO locus problem
The observed data x (x1,..,xn) is a sequence of
letters (blood types) from the alphabet
A,B,AB,O. eg (B,A,B,B,O,A,B,A,O,B, AB) are
observations (x1,x11). The hidden data (ie the
ys) for each letter xj is the set of ordered
pairs of alleles that generates it. For instance,
for A it is the set aa, ao, oa. The
parameters ? qa ,qb, qo are the probabilities
of the alleles. We need is to find the
parameters ? qa ,qb, qo that maximize the
likelihood of the given data. We do this by the
EM algorithm
23
EM for ABO locus problem

For each observed blood type xj?A,B,AB,O and
for each allele z in a,b,o we compute Nz(xj) ,
the expected number of times that z appear in xj.

Where the sum is taken over the ordered genotype
pairs yj, and Nz(yj) is the number of times
allele z occurs in the pair yj. eg, Na(o,b)0
Nb(o,b) No(o,b) 1.
24
EM for ABO locus problem
The computation for blood type B
P(B?) P((b,b)?) p((b,o)?) p((o,b)?))
qb2 2qbqo. Since Nb((b,b))2, and
Nb((b,o))Nb((o,b)) No((o,b))No((b,o))1, No(B)
and Nb(B) , the expected number of occurrences
of o and b in B, are given by
Observe that Nb(B) No(B) 2
25
EM for ABO loci
Similarly, P(A?) qa2 2qaqo.
P(AB?) p((b,a)?) p((a,b)?)) 2qaqb
Na(AB) Nb(AB) 1
P(O?) p((o,o)?) qo2
No(O) 2
Nb(O) Na(O) No(AB) Nb(A) Na(B) 0
26
E step compute Na, Nb and No
Let (A)3, (B)5, (AB)1, (O)2 be the number
of observations of A, B, AB, and O respectively.
M step set ?( qa, qb , qo)
27
Example the Motif Finding Problem

Given a set of DNA sequences
cctgatagacgctatctggctatccacgtacgtaggtcctctgtgcgaat
ctatgcgtttccaaccat
agtactggtgtacatttgatacgtacgtacaccggcaacctgaaacaaac
gctcagaaccagaagtgc
aaacgtacgtgcaccctctttcttcgtggctctggccaacgagggctgat
gtataagacgaaaatttt
agcctccgatgtaagtcatagctgtaactattacctgccacccctattac
atcttacgtacgtataca
ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgct
cgatcgttaacgtacgtc
Find the motif in each of the individual sequences

28
Motif Finding Problem a reformulation

Collect all substrings with the same length k
from the input sequences
With N sequences with same length L, nN?(L-k1)
substrings can be derived
Find a significant number of substring that can
be described by a profile model.

29
Fitting a mixture model by EM

A finite mixture model
data X (X1,,Xn) arises from two or more groups
with g models ? (?1, , ??g).
Indicator vectors Z (Z1,,Zn), where Zi
(Zi1,,Zig), and Zij 1 if Xi is from group j,
and 0 otherwise.
P(Zij 1?) ?j . For any given i, all Zij are
0 except one j
g2 class 1 (the motif) and class 2 (the
background) are given by position specific and a
general multinomial distribution

30
Complete data likelihood

Under the assumption that the pairs (Zi,Xi)
are mutually independent, their joint density may
be written
P(Z, X ?) ?ij ?j P(Xi?j)
Zij
The log likelihood of the model is thus
log L(?, ? Z, X) ?? Zij log ?j
P(Xi?j) .
The EM algorithm iteratively computes the
expectation of the likelihood given the observed
data X, and initial estimates ? and ? of ? and
? (the E-step), and then maximizes the result in
the free variables ? and ? leading to new
estimates ? and ? (the M-step).

31
Mixture models the E-step

Since the log likelihood is the sum of over i
and j of terms multiplying Zij, and these are
independent across i, we need only consider the
expectation of one such, given Xi. Using initial
parameter values ? and ?, and the fact that
the Zij are binary, we get
E(Zij X,?,?)?jP(Xi?j)/ ?k
?kP(Xi?k) Zij

32
Mixture models the M-step

Now we want to maximize the result of an
E-step
?? Zij ?j ?? Zij log P(Xi
?j).
The maximization over ? is independent of the
rest and is readily achieved with
?j ?iZij / n.

33
Mixture models the M-step

Note, P(Xi?1) ?j?k fjk I(k,Xij) , and P(Xi
?2) ?j?k f0k I(k,Xij)
where Xij is the letter in the jth position of
sample i, and I(k,a) 1 if aak, and 0
otherwise.
c0k ?? Zi2I(k,Xij), and cjk ??
Zi1I(k,Xij).
Here c0k is the expected number of times
letter ak appears in the background, and cjk the
expected number of times ak appears in
occurrences of the motif in the data.
fjk cjk / ?k1L cjk , j
0,1,,w k 1,,L.
In practice, care must be taken to avoid zero
frequencies, so one adds pseudo-counts small
constants ?j ,??j ?, giving
fjk (cjk ?j )/ (?k1L cjk ?) , j
0,1,,w k 1,,L.