Maximum Likelihood Estimates and the EM Algorithms II

About This Presentation

Title:

Maximum Likelihood Estimates and the EM Algorithms II

Description:

Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu_at_stat.nctu.edu.tw – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 79

Provided by: tigpbpIis

Category:

more less

Transcript and Presenter's Notes

Title: Maximum Likelihood Estimates and the EM Algorithms II

1
Maximum Likelihood Estimates and the EM
Algorithms II

Henry Horng-Shing Lu
Institute of Statistics
National Chiao Tung University
hslu_at_stat.nctu.edu.tw
http//tigpbp.iis.sinica.edu.tw/courses.htm

2
Part 1Computation Tools
3
Include Functions in R

source("file path")
Example
In MME.R

In R

MMEfunction(y1, y2, y3, y4) ny1y2y3y4 ph
i14.0(y1/n-0.5) phi21-4y2/n phi31-4y3/n
phi44.0y4/n phi(phi1phi2phi3phi4)/4.0
print("By MME method") return(phi)
print(phi)
gt source(C/MME.R) gt MME(125, 18, 20, 24) 1
"By MME method" 1 0.5935829
4
Part 2Motivation Examples
5
Example 1 in Genetics (1)

Two linked loci with alleles A and a, and B and b
A, B dominant
a, b recessive
A double heterozygote AaBb will produce gametes
of four types AB, Ab, aB, ab

6
Example 1 in Genetics (2)

Probabilities for genotypes in gametes

No Recombination Recombination
Male 1-r r
Female 1-r r
AB ab aB Ab
Male (1-r)/2 (1-r)/2 r/2 r/2
Female (1-r)/2 (1-r)/2 r/2 r/2
7
Example 1 in Genetics (3)

Fisher, R. A. and Balmukand, B. (1928). The
estimation of linkage from the offspring of
selfed heterozygotes. Journal of Genetics, 20,
7992.
More
http//en.wikipedia.org/wiki/Genetics
http//www2.isye.gatech.edu/brani/isyebayes/bank/
handout12.pdf

8
Example 1 in Genetics (4)
MALE MALE MALE MALE
AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2
F E M A L E AB (1-r)/2 AABB (1-r) (1-r)/4 aABb (1-r) (1-r)/4 aABB r (1-r)/4 AABb r (1-r)/4
F E M A L E ab (1-r)/2 AaBb (1-r) (1-r)/4 aabb (1-r) (1-r)/4 aaBb r (1-r)/4 Aabb r (1-r)/4
F E M A L E aB r/2 AaBB (1-r) r/4 aabB (1-r) r/4 aaBB r r/4 AabB r r/4
F E M A L E Ab r/2 AABb (1-r) r/4 aAbb (1-r) r/4 aABb r r/4 AAbb r r/4
9
Example 1 in Genetics (5)

Four distinct phenotypes
AB, Ab, aB and ab.
A the dominant phenotype from (Aa, AA, aA).
a the recessive phenotype from aa.
B the dominant phenotype from (Bb, BB, bB).
b the recessive phenotype from bb.
AB 9 gametic combinations.
Ab 3 gametic combinations.
aB 3 gametic combinations.
ab 1 gametic combination.
Total 16 combinations.

10
Example 1 in Genetics (6)

Let , then

11
Example 1 in Genetics (7)

Hence, the random sample of n from the offspring
of selfed heterozygotes will follow a multinomial
distribution
We know that
and
So

12
Example 1 in Genetics (8)

Suppose that we observe the data of
which is a random sample from
Then the probability mass function is

13
Maximum Likelihood Estimate (MLE)

Likelihood
Maximize likelihood Solve the score equations,
which are setting the first derivates of
likelihood to be zeros.
Under regular conditions, the MLE is consistent,
asymptotic efficient and normal!
More
http//en.wikipedia.org/wiki/Maximum_likelihood

14
MLE for Example 1 (1)

Likelihood
MLE

15
MLE for Example 1 (2)
A
B
C
16
MLE for Example 1 (3)

Checking
1.
2.
3. Compare ?

17
Part 3Numerical Solutions for the Score
Equations of MLEs
18
A Banach Space

A Banach space is a vector space over the
field such that
Every Cauchy sequence of converges in
(i.e., is complete).
More
http//en.wikipedia.org/wiki/Banach_space

19
Lipschitz Continuous

A closed subset and mapping
is Lipschitz continuous on with
if
is a contraction mapping on if is
Lipschitz continuous and
More
http//en.wikipedia.org/wiki/Lipschitz_continuous

20
Fixed Point Theorem (1)

If is a contraction mapping on if is
Lipschitz continuous and
has an unique fixed point
such that
initial

21
Fixed Point Theorem (2)

More
http//en.wikipedia.org/wiki/Fixed-point_theorem
http//www.math-linux.com/spip.php?article60

22
Applications for MLE (1)

Numerical solution
is a contraction mapping
initial value that
Then

23
Applications for MLE (2)

How to choose s.t. is a contraction
mapping?
Optimal ?

24
Parallel Chord Method (1)

Parallel chord method is also called simple
iteration.

25
Parallel Chord Method (2)
26
Plot Parallel Chord Method by R (1)

Simple iteration
y1 125 y2 18 y3 20 y4 24
First and second derivatives of log likelihood
f1 lt- function(phi) y1/(2phi)-(y2y3)/(1-phi)y4
/phi
f2 lt- function(phi) (-1)y1(2phi)(-2)-(y2y3)
(1-phi)(-2)-y4(phi)(-2)
x c(1080)0.01
y f1(x)
plot(x, y, type 'l', main "Parallel chord
method", xlab expression(varphi), ylab "First
derivative of log likelihood function")
abline(h 0)

27
Plot Parallel Chord Method by R (2)

phi0 0.25 Given the initial value 0.25
segments(0, f1(phi0), phi0, f1(phi0), col
"green", lty 2)
segments(phi0, f1(phi0), phi0, -200, col
"green", lty 2)
Use the tangent line to find the intercept b0
b0 f1(phi0)-f2(phi0)phi0
curve(f2(phi0)xb0, add T, col "red")
phi1 -b0/f2(phi0) Find the closer phi
segments(phi1, -200, phi1, f1(phi1), col
"green", lty 2)
segments(0, f1(phi1), phi1, f1(phi1), col
"green", lty 2)
Use the parallel line to find the intercept b1
b1 f1(phi1)-f2(phi0)phi1
curve(f2(phi0)xb1, add T, col "red")

28
Define Functions for Example 1 in R

We will define some functions and variables for
finding the MLE in Example 1 by R

Fist, second and third derivatives of log
likelihood f1 function(y1, y2, y3, y4,
phi)y1/(2phi)-(y2y3)/(1-phi)y4/phi f2
function(y1, y2, y3, y4, phi) (-1)y1(2phi)(-2
)-(y2y3)(1-phi)(-2)-y4(phi)(-2) f3
function(y1, y2, y3, y4, phi) 2y1(2phi)(-3)-2
(y2y3)(1-phi)(-3)2y4(phi)(-3) Fisher
Information I function(y1, y2, y3, y4, phi)
(-1)(y1y2y3y4)(1/(4(2phi))1/(2(1-phi))1
/(4phi)) y1 125 y2 18 y3 20 y4 24
initial 0.9
29
Parallel Chord Method by R (1)

gt fix(SimpleIteration)
function(y1, y2, y3, y4, initial)
phi NULL
i 0
alpha -1.0/f2(y1, y2, y3, y4, initial)
phi2 initial
phi1 initial1
while(abs(phi1-phi2) gt 1.0E-5)
i i1
phi1 phi2
phi2 alphaf1(y1, y2, y3, y4, phi1)phi1
phii phi2
print("By parallel chord method(simple
iteration)")
return(list(phi phi2, iteration phi))

30
Parallel Chord Method by R (2)

gt SimpleIteration(y1, y2, y3, y4, initial)

31
Parallel Chord Method by C/C
32
Newton-Raphson Method (1)

http//math.fullerton.edu/mathews/n2003/Newton'sMe
thodMod.html
http//en.wikipedia.org/wiki/Newton27s_method

33
Newton-Raphson Method (2)
34
Plot Newton-Raphson Method by R (1)

Newton-Raphson Method
y1 125 y2 18 y3 20 y4 24
First and second derivatives of log likelihood
f1 lt- function(phi) y1/(2phi)-(y2y3)/(1-phi)y4
/phi
f2 lt- function(phi) (-1)y1(2phi)(-2)-(y2y3)
(1-phi)(-2)-y4(phi)(-2)
x c(1080)0.01
y f1(x)
plot(x, y, type 'l', main "Newton-Raphson
method", xlab expression(varphi), ylab "First
derivative of log likelihood function")
abline(h 0)

35
Plot Newton-Raphson Method by R (2)

Given the initial value 0.25
phi0 0.25
segments(0, f1(phi0), phi0, f1(phi0), col
"green", lty 2)
segments(phi0, f1(phi0), phi0, -200, col
"green", lty 2)
Use the tangent line to find the intercept b0
b0 f1(phi0)-f2(phi0)phi0
curve(f2(phi0)xb0, add T, col "purple", lwd
2)
Find the closer phi
phi1 -b0/f2(phi0)
segments(phi1, -200, phi1, f1(phi1), col
"green", lty 2)
segments(0, f1(phi1), phi1, f1(phi1), col
"green", lty 2)

36
Plot Newton-Raphson Method by R (3)

Use the parallel line to find the intercept b1
b1 f1(phi1)-f2(phi0)phi1
curve(f2(phi0)xb1, add T, col "red")
curve(f2(phi1)x-f2(phi1)phi1f1(phi1), add T,
col "blue", lwd 2)
legend(0.45, 250, c("Newton-Raphson", "Parallel
chord method"), col c("blue", "red"), lty
c(1, 1))

37
Newton-Raphson Method by R (1)

gt fix(Newton)
function(y1, y2, y3, y4, initial)
i 0
phi NULL
phi2 initial
phi1 initial1
while(abs(phi1-phi2) gt 1.0E-6)
i i1
phi1 phi2
alpha 1.0/(f2(y1, y2, y3, y4, phi1))
phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
y3, y4, phi1)
phii phi2
print("By Newton-Raphson method")
return (list(phi phi2, iteration phi))

38
Newton-Raphson Method by R (2)

gt Newton(125, 18, 20, 24, 0.9)
1 "By Newton-Raphson method"
phi
1 0.577876
iteration
1 0.8193054 0.7068499 0.6092827 0.5792747
0.5778784 0.5778760 0.5778760

39
Newton-Raphson Method by C/C
40
Halleys Method

The Newton-Raphson iteration function is
It is possible to speed up convergence by using
more expansion terms than the Newton-Raphson
method does when the object function is very
smooth, like the method by Edmond Halley
(1656-1742)

(http//math.fullerton.edu/mathews/n2003/Halley'sM
ethodMod.html)
41
Halleys Method by R (1)

gt fix(Halley)
function( y1, y2, y3, y4, initial)
i 0
phi NULL
phi2 initial
phi1 initial1
while(abs(phi1-phi2) gt 1.0E-6)
i i1
phi1 phi2
alpha 1.0/(f2(y1, y2, y3, y4, phi1))
phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
y3, y4, phi1)1.0/(1.0-f1(y1, y2, y3, y4,
phi1)f3(y1, y2, y3, y4, phi1)/(f2(y1, y2, y3,
y4, phi1)f2(y1, y2, y3, y4, phi1)2.0))
phii phi2
print("By Halley method")
return (list(phi phi2, iteration phi))

42
Halleys Method by R (2)

gt Halley(125, 18, 20, 24, 0.9)
1 "By Halley method"
phi
1 0.577876
iteration
1 0.5028639 0.5793692 0.5778760 0.5778760

43
Halleys Method by C/C
44
Bisection Method (1)

Assume that and that there exists a
number such that . If
and have opposite signs, and represents the
sequence of midpoints generated by the bisection
process, thenand the sequence converges
to .
That is, .
http//en.wikipedia.org/wiki/Bisection_method

45
Bisection Method (2)
1
46
Plot the Bisection Method by R

Bisection method
y1 125 y2 18 y3 20 y4 24
f lt- function(phi) y1/(2phi)-(y2y3)/(1-phi)y4/
phi
x c(1100)0.01
y f(x)
plot(x, y, type 'l', main "Bisection method",
xlab expression(varphi), ylab "First
derivative of log likelihood function")
abline(h 0)
abline(v 0.5, col "red")
abline(v 0.75, col "red")
text(0.49, 2200, labels "1")
text(0.74, 2200, labels "2")

47
Bisection Method by R (1)

gt fix(Bisection)
function(y1, y2, y3, y4, A, B) A, B is the
boundary of parameter
Delta 1.0E-6 Tolerance for width of
interval
Satisfied 0 Condition for loop termination
phi NULL
YA f1(y1, y2, y3, y4, A) Compute function
values
YB f1(y1, y2, y3, y4, B)
Calculation of the maximum number of
iterations
Max as.integer(1floor((log(B-A)-log(Delta))/lo
g(2)))
Check to see if the bisection method applies
if(((YA gt 0) (YB gt0)) ((YA lt 0) (YB lt
0)))
print("The values of function in boundary point
do not differ in sign.")

48
Bisection Method by R (2)

print("Therefore, this method is not
appropriate here.")
quit() Exit program
for(K in 1Max)
if(Satisfied 1)
break
C (AB)/2 Midpoint of interval
YC f1(y1, y2, y3, y4, C) Function value at
midpoint
if((K-1) lt 100)
phiK-1 C
if(YC 0)
A C Exact root is found
B C
else
if((YBYC) gt 0 )

49
Bisection Method by R (3)

B C Squeeze from the right
YB YC
else
A C Squeeze from the left
YA YC
if((B-A) lt Delta)
Satisfied 1 Check for early convergence
End of 'for'-loop
print("By Bisection Method")
return(list(phi C, iteration phi))

50
Bisection Method by R (4)

gt Bisection(125, 18, 20, 24, 0.25, 1)
1 "By Bisection Method"
phi
1 0.5778754
iteration
1 0.4375000 0.5312500 0.5781250 0.5546875
0.5664062 0.5722656 0.5751953
8 0.5766602 0.5773926 0.5777588 0.5779419
0.5778503 0.5778961 0.5778732
15 0.5778847 0.5778790 0.5778761 0.5778747
0.5778754

51
Bisection Method by C/C (1)
52
Bisection Method by C/C (2)
53
Secant Method

More http//en.wikipedia.org/wiki/Secant_method
http//math.fullerton.edu/mathews/n2003/SecantMeth
odMod.html

54
Secant Method by R (1)

gt fix(Secant)
function(y1, y2, y3, y4, initial1, initial2)
phi NULL
phi2 initial1
phi1 initial2
alpha (phi2-phi1)/(f1(y1, y2, y3, y4,
phi2)-f1(y1, y2, y3, y4, phi1))
phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
y3, y4, phi1)
i 0
while(abs(phi1-phi2) gt 1.0E-6)
i i1
phi1 phi2

55
Secant Method by R (2)

alpha (phi2-phi1)/(f1(y1, y2, y3, y4,
phi2)-f1(y1, y2, y3, y4, phi1))
phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
y3, y4, phi1)
phii phi2
print("By Secant method")
return (list(phiphi2,iterationphi))

56
Secant Method by R (3)

gt Secant(125, 18, 20, 24, 0.9, 0.05)
1 "By Secant method"
phi
1 0.577876
iteration
1 0.2075634 0.4008018 0.5760396 0.5778801
0.5778760 0.5778760

57
Secant Method by C/C
58
Secant-Bracket Method

The secant-bracket method is also called the
regular falsi method.

59
Secant-Bracket Method by R (1)

gt fix(RegularFalsi)
function(y1, y2, y3, y4, A, B)
phi NULL
i -1
Delta 1.0E-6 Tolerance for width of
interval
Satisfied 1 Condition for loop termination
Endpoints of the interval A,B
YA f1(y1, y2, y3, y4, A) compute function
values
YB f1(y1, y2, y3, y4, B)
Check to see if the bisection method applies
if(((YA gt 0) (YB gt0)) ((YA lt 0) (YB lt
0)))
print("The values of function in boundary point
do not differ in sign")
print("Therefore, this method is not
appropriate here")
q() Exit program

60
Secant-Bracket Method by R (2)

while(Satisfied)
i i1
C (Bf1(y1, y2, y3, y4, A)-Af1(y1, y2, y3,
y4, B))/(f1(y1, y2, y3, y4, A)-f1(y1, y2, y3, y4,
B)) Midpoint of interval
YC f1(y1, y2, y3, y4, C) Function value at
midpoint
phii C
if(YC 0) First 'if'
A C Exact root is found
B C
else
if((YBYC) gt 0 )
B C Squeeze from the right
YB YC

61
Secant-Bracket Method by R (3)

else
A C Squeeze from the left
YA YC
if(f1(y1, y2, y3, y4, C) lt Delta)
Satisfied 0 Check for early convergence
print("By Regular Falsi Method")
return(list(phi C, iteration phi))

62
Secant-Bracket Method by R (4)

gt RegularFalsi(y1, y2, y3, y4, 0.9, 0.05)
1 "By Regular Falsi Method"
phi
1 0.577876
iteration
1 0.5758648 0.5765007 0.5769352 0.5772324
0.5774356 0.5775746 0.5776698
8 0.5777348 0.5777794 0.5778099 0.5778307
0.5778450 0.5778548 0.5778615
15 0.5778660 0.5778692 0.5778713 0.5778728
0.5778738 0.5778745 0.5778750
22 0.5778753 0.5778755 0.5778756 0.5778757
0.5778758 0.5778759 0.5778759
29 0.5778759 0.5778759 0.5778759 0.5778760
0.5778760 0.5778760 0.5778760
36 0.5778760 0.5778760

63
Secant-Bracket Method by C/C (1)
64
Secant-Bracket Method by C/C (2)
65
Fisher Scoring Method

Fisher scoring method replaces
by where is
the Fisher information matrix when the parameter
may be multivariate.

66
Fisher Scoring Method by R (1)

gt fix(Fisher)
function(y1, y2, y3, y4, initial)
i 0
phi NULL
phi2 initial
phi1 initial1
while(abs(phi1-phi2) gt 1.0E-6)
i i1
phi1 phi2
alpha 1.0/I(y1, y2, y3, y4, phi1)
phi2 phi1-f1(y1, y2, y3, y4, phi1)/I(y1, y2,
y3, y4, phi1)
phii phi2
print("By Fisher method")
return(list(phi phi2, iteration phi))

67
Fisher Scoring Method by R (2)

gt Fisher(125, 18, 20, 24, 0.9)
1 "By Fisher method"
phi
1 0.577876
iteration
1 0.5907181 0.5785331 0.5779100 0.5778777
0.5778761 0.5778760

68
Fisher Scoring Method by C/C
69
Order of Convergence

Order of convergence is if
and for .
More http//en.wikipedia.org/wiki/Order_of_conver
gence
Note as
Hence, we can use regression to estimate

70
Theorem for Newton-Raphson Method

If , is a
contraction mapping then
and
If exists,
has a simple zero, then
such that of the Newton-Raphson method
is a contraction mapping and .

71
Find Convergence Order by R (1)

gt Coverage order
gt Newton method can be substitute for different
method
gt R Newton(y1, y2, y3, y4, initial)
1 "By Newton-Raphson method"
gt temp log(abs(Riteration-Rphi))
gt y temp2(length(temp)-1)
gt x temp1(length(temp)-2)
gt lm(yx)
Call
lm(formula y x)
Coefficients
(Intercept) x
0.6727 2.0441

72
Find Convergence Order by R (2)
gtR SimpleIteration(y1, y2, y3, y4, initial) 1
"By parallel chord method(simple iteration)" gt
temp log(abs(Riteration-Rphi)) gt y
temp2(length(temp)-1) gt x temp1(length(temp
)-2) gt lm(yx) Call lm(formula y
x) Coefficients (Intercept) x
-0.01651 1.01809
gt R Secant(y1, y2, y3, y4, initial, 0.05) 1
"By Secant method" gt temp log(abs(Riteration-R
phi)) gt y temp2(length(temp)-1) gt x
temp1(length(temp)-2) gt lm(yx) Call lm(formul
a y x) Coefficients (Intercept)
x -1.245 1.869
gt R Fisher(y1, y2, y3, y4, initial) 1 "By
Fisher method" gt templog(abs(Riteration-Rphi))
gt ytemp2(length(temp)-1) gt xtemp1(length(te
mp)-2) gt lm(yx) Call lm(formula y
x) Coefficients (Intercept) x
-2.942 1.004
gt R Bisection(y1, y2, y3, y4, 0.25, 1) 1 "By
Bisection Method" gt templog(abs(Riteration-Rphi
)) gt ytemp2(length(temp)-1) gt
xtemp1(length(temp)-2) gt lm(yx) Call lm(form
ula y x) Coefficients (Intercept)
x -1.9803 0.8448
73
Find Convergence Order by R (3)

gt R RegularFalsi(y1, y2, y3, y4, initial, 0.05)
1 "By Regular Falsi Method"
gt temp log(abs(Riteration-Rphi))
gt y temp2(length(temp)-1)
gt x temp1(length(temp)-2)
gt lm(yx)
Call
lm(formula y x)
Coefficients
(Intercept) x
-0.2415 1.0134

74
Find Convergence Order by C/C
75
Exercises

Write your own programs for those examples
presented in this talk.
Write programs for those examples mentioned at
the following web page
http//en.wikipedia.org/wiki/Maximum_likelihood
Write programs for the other examples that you
know.

76
More Exercises (1)

Example 3 in genetics
The observed data are
where , , and fall in
such that
Find the MLEs for , , and .

77
More Exercises (2)

Example 4 in the positron emission tomography
(PET)
The observed data are
and
The values of are known and the unknown
parameters are .
Find the MLEs for .

78
More Exercises (3)

Example 5 in the normal mixture
The observed data are random
samples from the following probability density
function
Find the MLEs for the following parameters

Write a Comment

User Comments (0)