Title: Maximum Likelihood Estimates and the EM Algorithms II
1Maximum Likelihood Estimates and the EM
Algorithms II
- Henry Horng-Shing Lu
- Institute of Statistics
- National Chiao Tung University
- hslu_at_stat.nctu.edu.tw
- http//tigpbp.iis.sinica.edu.tw/courses.htm
2Part 1Computation Tools
3Include Functions in R
- source("file path")
- Example
- In MME.R
MMEfunction(y1, y2, y3, y4) ny1y2y3y4 ph
i14.0(y1/n-0.5) phi21-4y2/n phi31-4y3/n
phi44.0y4/n phi(phi1phi2phi3phi4)/4.0
print("By MME method") return(phi)
print(phi)
gt source(C/MME.R) gt MME(125, 18, 20, 24) 1
"By MME method" 1 0.5935829
4Part 2Motivation Examples
5Example 1 in Genetics (1)
- Two linked loci with alleles A and a, and B and b
- A, B dominant
- a, b recessive
- A double heterozygote AaBb will produce gametes
of four types AB, Ab, aB, ab
6Example 1 in Genetics (2)
- Probabilities for genotypes in gametes
No Recombination Recombination
Male 1-r r
Female 1-r r
AB ab aB Ab
Male (1-r)/2 (1-r)/2 r/2 r/2
Female (1-r)/2 (1-r)/2 r/2 r/2
7Example 1 in Genetics (3)
- Fisher, R. A. and Balmukand, B. (1928). The
estimation of linkage from the offspring of
selfed heterozygotes. Journal of Genetics, 20,
7992. - More
- http//en.wikipedia.org/wiki/Genetics
http//www2.isye.gatech.edu/brani/isyebayes/bank/
handout12.pdf
8Example 1 in Genetics (4)
MALE MALE MALE MALE
AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2
F E M A L E AB (1-r)/2 AABB (1-r) (1-r)/4 aABb (1-r) (1-r)/4 aABB r (1-r)/4 AABb r (1-r)/4
F E M A L E ab (1-r)/2 AaBb (1-r) (1-r)/4 aabb (1-r) (1-r)/4 aaBb r (1-r)/4 Aabb r (1-r)/4
F E M A L E aB r/2 AaBB (1-r) r/4 aabB (1-r) r/4 aaBB r r/4 AabB r r/4
F E M A L E Ab r/2 AABb (1-r) r/4 aAbb (1-r) r/4 aABb r r/4 AAbb r r/4
9Example 1 in Genetics (5)
- Four distinct phenotypes
- AB, Ab, aB and ab.
- A the dominant phenotype from (Aa, AA, aA).
- a the recessive phenotype from aa.
- B the dominant phenotype from (Bb, BB, bB).
- b the recessive phenotype from bb.
- AB 9 gametic combinations.
- Ab 3 gametic combinations.
- aB 3 gametic combinations.
- ab 1 gametic combination.
- Total 16 combinations.
10Example 1 in Genetics (6)
11Example 1 in Genetics (7)
- Hence, the random sample of n from the offspring
of selfed heterozygotes will follow a multinomial
distribution - We know that
- and
- So
12Example 1 in Genetics (8)
- Suppose that we observe the data of
- which is a random sample from
- Then the probability mass function is
13Maximum Likelihood Estimate (MLE)
- Likelihood
- Maximize likelihood Solve the score equations,
which are setting the first derivates of
likelihood to be zeros. - Under regular conditions, the MLE is consistent,
asymptotic efficient and normal! - More
- http//en.wikipedia.org/wiki/Maximum_likelihood
14MLE for Example 1 (1)
15MLE for Example 1 (2)
A
B
C
16MLE for Example 1 (3)
- Checking
- 1.
- 2.
- 3. Compare ?
17Part 3Numerical Solutions for the Score
Equations of MLEs
18A Banach Space
- A Banach space is a vector space over the
field such that -
-
-
-
- Every Cauchy sequence of converges in
(i.e., is complete). - More
- http//en.wikipedia.org/wiki/Banach_space
19Lipschitz Continuous
- A closed subset and mapping
- is Lipschitz continuous on with
- if
- is a contraction mapping on if is
Lipschitz continuous and - More
- http//en.wikipedia.org/wiki/Lipschitz_continuous
20Fixed Point Theorem (1)
- If is a contraction mapping on if is
Lipschitz continuous and - has an unique fixed point
- such that
- initial
-
21Fixed Point Theorem (2)
- More
- http//en.wikipedia.org/wiki/Fixed-point_theorem
- http//www.math-linux.com/spip.php?article60
22Applications for MLE (1)
- Numerical solution
- is a contraction mapping
- initial value that
- Then
23Applications for MLE (2)
- How to choose s.t. is a contraction
mapping? - Optimal ?
24Parallel Chord Method (1)
- Parallel chord method is also called simple
iteration. -
25Parallel Chord Method (2)
26Plot Parallel Chord Method by R (1)
- Simple iteration
- y1 125 y2 18 y3 20 y4 24
- First and second derivatives of log likelihood
- f1 lt- function(phi) y1/(2phi)-(y2y3)/(1-phi)y4
/phi - f2 lt- function(phi) (-1)y1(2phi)(-2)-(y2y3)
(1-phi)(-2)-y4(phi)(-2) - x c(1080)0.01
- y f1(x)
- plot(x, y, type 'l', main "Parallel chord
method", xlab expression(varphi), ylab "First
derivative of log likelihood function") - abline(h 0)
27Plot Parallel Chord Method by R (2)
- phi0 0.25 Given the initial value 0.25
- segments(0, f1(phi0), phi0, f1(phi0), col
"green", lty 2) - segments(phi0, f1(phi0), phi0, -200, col
"green", lty 2) - Use the tangent line to find the intercept b0
- b0 f1(phi0)-f2(phi0)phi0
- curve(f2(phi0)xb0, add T, col "red")
- phi1 -b0/f2(phi0) Find the closer phi
- segments(phi1, -200, phi1, f1(phi1), col
"green", lty 2) - segments(0, f1(phi1), phi1, f1(phi1), col
"green", lty 2) - Use the parallel line to find the intercept b1
- b1 f1(phi1)-f2(phi0)phi1
- curve(f2(phi0)xb1, add T, col "red")
28Define Functions for Example 1 in R
- We will define some functions and variables for
finding the MLE in Example 1 by R
Fist, second and third derivatives of log
likelihood f1 function(y1, y2, y3, y4,
phi)y1/(2phi)-(y2y3)/(1-phi)y4/phi f2
function(y1, y2, y3, y4, phi) (-1)y1(2phi)(-2
)-(y2y3)(1-phi)(-2)-y4(phi)(-2) f3
function(y1, y2, y3, y4, phi) 2y1(2phi)(-3)-2
(y2y3)(1-phi)(-3)2y4(phi)(-3) Fisher
Information I function(y1, y2, y3, y4, phi)
(-1)(y1y2y3y4)(1/(4(2phi))1/(2(1-phi))1
/(4phi)) y1 125 y2 18 y3 20 y4 24
initial 0.9
29Parallel Chord Method by R (1)
- gt fix(SimpleIteration)
- function(y1, y2, y3, y4, initial)
- phi NULL
- i 0
- alpha -1.0/f2(y1, y2, y3, y4, initial)
- phi2 initial
- phi1 initial1
- while(abs(phi1-phi2) gt 1.0E-5)
- i i1
- phi1 phi2
- phi2 alphaf1(y1, y2, y3, y4, phi1)phi1
- phii phi2
-
- print("By parallel chord method(simple
iteration)") - return(list(phi phi2, iteration phi))
30Parallel Chord Method by R (2)
- gt SimpleIteration(y1, y2, y3, y4, initial)
31Parallel Chord Method by C/C
32Newton-Raphson Method (1)
-
- http//math.fullerton.edu/mathews/n2003/Newton'sMe
thodMod.html - http//en.wikipedia.org/wiki/Newton27s_method
33Newton-Raphson Method (2)
34Plot Newton-Raphson Method by R (1)
- Newton-Raphson Method
- y1 125 y2 18 y3 20 y4 24
- First and second derivatives of log likelihood
- f1 lt- function(phi) y1/(2phi)-(y2y3)/(1-phi)y4
/phi - f2 lt- function(phi) (-1)y1(2phi)(-2)-(y2y3)
(1-phi)(-2)-y4(phi)(-2) - x c(1080)0.01
- y f1(x)
- plot(x, y, type 'l', main "Newton-Raphson
method", xlab expression(varphi), ylab "First
derivative of log likelihood function") - abline(h 0)
35Plot Newton-Raphson Method by R (2)
- Given the initial value 0.25
- phi0 0.25
- segments(0, f1(phi0), phi0, f1(phi0), col
"green", lty 2) - segments(phi0, f1(phi0), phi0, -200, col
"green", lty 2) - Use the tangent line to find the intercept b0
- b0 f1(phi0)-f2(phi0)phi0
- curve(f2(phi0)xb0, add T, col "purple", lwd
2) - Find the closer phi
- phi1 -b0/f2(phi0)
- segments(phi1, -200, phi1, f1(phi1), col
"green", lty 2) - segments(0, f1(phi1), phi1, f1(phi1), col
"green", lty 2)
36Plot Newton-Raphson Method by R (3)
- Use the parallel line to find the intercept b1
- b1 f1(phi1)-f2(phi0)phi1
- curve(f2(phi0)xb1, add T, col "red")
- curve(f2(phi1)x-f2(phi1)phi1f1(phi1), add T,
col "blue", lwd 2) - legend(0.45, 250, c("Newton-Raphson", "Parallel
chord method"), col c("blue", "red"), lty
c(1, 1))
37Newton-Raphson Method by R (1)
- gt fix(Newton)
- function(y1, y2, y3, y4, initial)
- i 0
- phi NULL
- phi2 initial
- phi1 initial1
- while(abs(phi1-phi2) gt 1.0E-6)
- i i1
- phi1 phi2
- alpha 1.0/(f2(y1, y2, y3, y4, phi1))
- phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
y3, y4, phi1) - phii phi2
-
- print("By Newton-Raphson method")
- return (list(phi phi2, iteration phi))
38Newton-Raphson Method by R (2)
- gt Newton(125, 18, 20, 24, 0.9)
- 1 "By Newton-Raphson method"
- phi
- 1 0.577876
- iteration
- 1 0.8193054 0.7068499 0.6092827 0.5792747
0.5778784 0.5778760 0.5778760
39Newton-Raphson Method by C/C
40Halleys Method
- The Newton-Raphson iteration function is
- It is possible to speed up convergence by using
more expansion terms than the Newton-Raphson
method does when the object function is very
smooth, like the method by Edmond Halley
(1656-1742)
(http//math.fullerton.edu/mathews/n2003/Halley'sM
ethodMod.html)
41Halleys Method by R (1)
- gt fix(Halley)
- function( y1, y2, y3, y4, initial)
- i 0
- phi NULL
- phi2 initial
- phi1 initial1
- while(abs(phi1-phi2) gt 1.0E-6)
- i i1
- phi1 phi2
- alpha 1.0/(f2(y1, y2, y3, y4, phi1))
- phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
y3, y4, phi1)1.0/(1.0-f1(y1, y2, y3, y4,
phi1)f3(y1, y2, y3, y4, phi1)/(f2(y1, y2, y3,
y4, phi1)f2(y1, y2, y3, y4, phi1)2.0)) - phii phi2
-
- print("By Halley method")
- return (list(phi phi2, iteration phi))
42Halleys Method by R (2)
- gt Halley(125, 18, 20, 24, 0.9)
- 1 "By Halley method"
- phi
- 1 0.577876
- iteration
- 1 0.5028639 0.5793692 0.5778760 0.5778760
43Halleys Method by C/C
44Bisection Method (1)
- Assume that and that there exists a
number such that . IfÂ
and have opposite signs, and represents the
sequence of midpoints generated by the bisection
process, thenand the sequence converges
to . - That is, .
- http//en.wikipedia.org/wiki/Bisection_method
45Bisection Method (2)
1
46Plot the Bisection Method by R
- Bisection method
- y1 125 y2 18 y3 20 y4 24
- f lt- function(phi) y1/(2phi)-(y2y3)/(1-phi)y4/
phi - x c(1100)0.01
- y f(x)
- plot(x, y, type 'l', main "Bisection method",
xlab expression(varphi), ylab "First
derivative of log likelihood function") - abline(h 0)
- abline(v 0.5, col "red")
- abline(v 0.75, col "red")
- text(0.49, 2200, labels "1")
- text(0.74, 2200, labels "2")
47Bisection Method by R (1)
- gt fix(Bisection)
- function(y1, y2, y3, y4, A, B) A, B is the
boundary of parameter -
- Delta 1.0E-6 Tolerance for width of
interval - Satisfied 0 Condition for loop termination
- phi NULL
- YA f1(y1, y2, y3, y4, A) Compute function
values - YB f1(y1, y2, y3, y4, B)
- Calculation of the maximum number of
iterations - Max as.integer(1floor((log(B-A)-log(Delta))/lo
g(2))) - Check to see if the bisection method applies
- if(((YA gt 0) (YB gt0)) ((YA lt 0) (YB lt
0))) - print("The values of function in boundary point
do not differ in sign.")
48Bisection Method by R (2)
- print("Therefore, this method is not
appropriate here.") - quit() Exit program
-
- for(K in 1Max)
- if(Satisfied 1)
- break
- C (AB)/2 Midpoint of interval
- YC f1(y1, y2, y3, y4, C) Function value at
midpoint - if((K-1) lt 100)
- phiK-1 C
- if(YC 0)
- A C Exact root is found
- B C
-
- else
- if((YBYC) gt 0 )
49Bisection Method by R (3)
- B C Squeeze from the right
- YB YC
-
- else
- A C Squeeze from the left
- YA YC
-
-
- if((B-A) lt Delta)
- Satisfied 1 Check for early convergence
- End of 'for'-loop
- print("By Bisection Method")
- return(list(phi C, iteration phi))
50Bisection Method by R (4)
- gt Bisection(125, 18, 20, 24, 0.25, 1)
- 1 "By Bisection Method"
- phi
- 1 0.5778754
- iteration
- 1 0.4375000 0.5312500 0.5781250 0.5546875
0.5664062 0.5722656 0.5751953 - 8 0.5766602 0.5773926 0.5777588 0.5779419
0.5778503 0.5778961 0.5778732 - 15 0.5778847 0.5778790 0.5778761 0.5778747
0.5778754
51Bisection Method by C/C (1)
52Bisection Method by C/C (2)
53Secant Method
-
-
- More http//en.wikipedia.org/wiki/Secant_method
http//math.fullerton.edu/mathews/n2003/SecantMeth
odMod.html
54Secant Method by R (1)
- gt fix(Secant)
- function(y1, y2, y3, y4, initial1, initial2)
- phi NULL
- phi2 initial1
- phi1 initial2
- alpha (phi2-phi1)/(f1(y1, y2, y3, y4,
phi2)-f1(y1, y2, y3, y4, phi1)) - phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
y3, y4, phi1) - i 0
- while(abs(phi1-phi2) gt 1.0E-6)
- i i1
- phi1 phi2
55Secant Method by R (2)
- alpha (phi2-phi1)/(f1(y1, y2, y3, y4,
phi2)-f1(y1, y2, y3, y4, phi1)) - phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
y3, y4, phi1) - phii phi2
-
- print("By Secant method")
- return (list(phiphi2,iterationphi))
56Secant Method by R (3)
- gt Secant(125, 18, 20, 24, 0.9, 0.05)
- 1 "By Secant method"
- phi
- 1 0.577876
- iteration
- 1 0.2075634 0.4008018 0.5760396 0.5778801
0.5778760 0.5778760
57Secant Method by C/C
58Secant-Bracket Method
- The secant-bracket method is also called the
regular falsi method.
59Secant-Bracket Method by R (1)
- gt fix(RegularFalsi)
- function(y1, y2, y3, y4, A, B)
- phi NULL
- i -1
- Delta 1.0E-6 Tolerance for width of
interval - Satisfied 1 Condition for loop termination
- Endpoints of the interval A,B
- YA f1(y1, y2, y3, y4, A) compute function
values - YB f1(y1, y2, y3, y4, B)
- Check to see if the bisection method applies
- if(((YA gt 0) (YB gt0)) ((YA lt 0) (YB lt
0))) - print("The values of function in boundary point
do not differ in sign") - print("Therefore, this method is not
appropriate here") - q() Exit program
-
60Secant-Bracket Method by R (2)
- while(Satisfied)
- i i1
- C (Bf1(y1, y2, y3, y4, A)-Af1(y1, y2, y3,
y4, B))/(f1(y1, y2, y3, y4, A)-f1(y1, y2, y3, y4,
B)) Midpoint of interval - YC f1(y1, y2, y3, y4, C) Function value at
midpoint - phii C
- if(YC 0) First 'if'
- A C Exact root is found
- B C
- else
- if((YBYC) gt 0 )
- B C Squeeze from the right
- YB YC
61Secant-Bracket Method by R (3)
- else
- A C Squeeze from the left
- YA YC
-
-
- if(f1(y1, y2, y3, y4, C) lt Delta)
- Satisfied 0 Check for early convergence
-
- print("By Regular Falsi Method")
- return(list(phi C, iteration phi))
62Secant-Bracket Method by R (4)
- gt RegularFalsi(y1, y2, y3, y4, 0.9, 0.05)
- 1 "By Regular Falsi Method"
- phi
- 1 0.577876
- iteration
- 1 0.5758648 0.5765007 0.5769352 0.5772324
0.5774356 0.5775746 0.5776698 - 8 0.5777348 0.5777794 0.5778099 0.5778307
0.5778450 0.5778548 0.5778615 - 15 0.5778660 0.5778692 0.5778713 0.5778728
0.5778738 0.5778745 0.5778750 - 22 0.5778753 0.5778755 0.5778756 0.5778757
0.5778758 0.5778759 0.5778759 - 29 0.5778759 0.5778759 0.5778759 0.5778760
0.5778760 0.5778760 0.5778760 - 36 0.5778760 0.5778760
63Secant-Bracket Method by C/C (1)
64Secant-Bracket Method by C/C (2)
65Fisher Scoring Method
- Fisher scoring method replaces
by where is
the Fisher information matrix when the parameter
may be multivariate.
66Fisher Scoring Method by R (1)
- gt fix(Fisher)
- function(y1, y2, y3, y4, initial)
- i 0
- phi NULL
- phi2 initial
- phi1 initial1
- while(abs(phi1-phi2) gt 1.0E-6)
- i i1
- phi1 phi2
- alpha 1.0/I(y1, y2, y3, y4, phi1)
- phi2 phi1-f1(y1, y2, y3, y4, phi1)/I(y1, y2,
y3, y4, phi1) - phii phi2
-
- print("By Fisher method")
- return(list(phi phi2, iteration phi))
67Fisher Scoring Method by R (2)
- gt Fisher(125, 18, 20, 24, 0.9)
- 1 "By Fisher method"
- phi
- 1 0.577876
- iteration
- 1 0.5907181 0.5785331 0.5779100 0.5778777
0.5778761 0.5778760
68Fisher Scoring Method by C/C
69Order of Convergence
- Order of convergence is if
- and for .
- More http//en.wikipedia.org/wiki/Order_of_conver
gence - Note as
- Hence, we can use regression to estimate
70Theorem for Newton-Raphson Method
- If , is a
contraction mapping then
and - If exists,
has a simple zero, then
such that of the Newton-Raphson method
is a contraction mapping and .
71Find Convergence Order by R (1)
- gt Coverage order
- gt Newton method can be substitute for different
method - gt R Newton(y1, y2, y3, y4, initial)
- 1 "By Newton-Raphson method"
- gt temp log(abs(Riteration-Rphi))
- gt y temp2(length(temp)-1)
- gt x temp1(length(temp)-2)
- gt lm(yx)
- Call
- lm(formula y x)
- Coefficients
- (Intercept) x
- 0.6727 2.0441
72Find Convergence Order by R (2)
gtR SimpleIteration(y1, y2, y3, y4, initial) 1
"By parallel chord method(simple iteration)" gt
temp log(abs(Riteration-Rphi)) gt y
temp2(length(temp)-1) gt x temp1(length(temp
)-2) gt lm(yx) Call lm(formula y
x) Coefficients (Intercept) x
-0.01651 1.01809
gt R Secant(y1, y2, y3, y4, initial, 0.05) 1
"By Secant method" gt temp log(abs(Riteration-R
phi)) gt y temp2(length(temp)-1) gt x
temp1(length(temp)-2) gt lm(yx) Call lm(formul
a y x) Coefficients (Intercept)
x -1.245 1.869
gt R Fisher(y1, y2, y3, y4, initial) 1 "By
Fisher method" gt templog(abs(Riteration-Rphi))
gt ytemp2(length(temp)-1) gt xtemp1(length(te
mp)-2) gt lm(yx) Call lm(formula y
x) Coefficients (Intercept) x
-2.942 1.004
gt R Bisection(y1, y2, y3, y4, 0.25, 1) 1 "By
Bisection Method" gt templog(abs(Riteration-Rphi
)) gt ytemp2(length(temp)-1) gt
xtemp1(length(temp)-2) gt lm(yx) Call lm(form
ula y x) Coefficients (Intercept)
x -1.9803 0.8448
73Find Convergence Order by R (3)
- gt R RegularFalsi(y1, y2, y3, y4, initial, 0.05)
- 1 "By Regular Falsi Method"
- gt temp log(abs(Riteration-Rphi))
- gt y temp2(length(temp)-1)
- gt x temp1(length(temp)-2)
- gt lm(yx)
- Call
- lm(formula y x)
- Coefficients
- (Intercept) x
- -0.2415 1.0134
74Find Convergence Order by C/C
75Exercises
- Write your own programs for those examples
presented in this talk. - Write programs for those examples mentioned at
the following web page - http//en.wikipedia.org/wiki/Maximum_likelihood
- Write programs for the other examples that you
know.
76More Exercises (1)
- Example 3 in genetics
- The observed data are
- where , , and fall in
- such that
- Find the MLEs for , , and .
77More Exercises (2)
- Example 4 in the positron emission tomography
(PET) - The observed data are
- and
- The values of are known and the unknown
parameters are . - Find the MLEs for .
78More Exercises (3)
- Example 5 in the normal mixture
- The observed data are random
samples from the following probability density
function - Find the MLEs for the following parameters