Maximum Likelihood Estimates and the EM Algorithms II - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Maximum Likelihood Estimates and the EM Algorithms II

Description:

Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu_at_stat.nctu.edu.tw – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 79
Provided by: tigpbpIis
Category:

less

Transcript and Presenter's Notes

Title: Maximum Likelihood Estimates and the EM Algorithms II


1
Maximum Likelihood Estimates and the EM
Algorithms II
  • Henry Horng-Shing Lu
  • Institute of Statistics
  • National Chiao Tung University
  • hslu_at_stat.nctu.edu.tw
  • http//tigpbp.iis.sinica.edu.tw/courses.htm

2
Part 1Computation Tools
3
Include Functions in R
  • source("file path")
  • Example
  • In MME.R
  • In R

MMEfunction(y1, y2, y3, y4) ny1y2y3y4 ph
i14.0(y1/n-0.5) phi21-4y2/n phi31-4y3/n
phi44.0y4/n phi(phi1phi2phi3phi4)/4.0
print("By MME method") return(phi)
print(phi)
gt source(C/MME.R) gt MME(125, 18, 20, 24) 1
"By MME method" 1 0.5935829
4
Part 2Motivation Examples
5
Example 1 in Genetics (1)
  • Two linked loci with alleles A and a, and B and b
  • A, B dominant
  • a, b recessive
  • A double heterozygote AaBb will produce gametes
    of four types AB, Ab, aB, ab

6
Example 1 in Genetics (2)
  • Probabilities for genotypes in gametes

No Recombination Recombination
Male 1-r r
Female 1-r r
AB ab aB Ab
Male (1-r)/2 (1-r)/2 r/2 r/2
Female (1-r)/2 (1-r)/2 r/2 r/2
7
Example 1 in Genetics (3)
  • Fisher, R. A. and Balmukand, B. (1928). The
    estimation of linkage from the offspring of
    selfed heterozygotes. Journal of Genetics, 20,
    7992.
  • More
  • http//en.wikipedia.org/wiki/Genetics
    http//www2.isye.gatech.edu/brani/isyebayes/bank/
    handout12.pdf

8
Example 1 in Genetics (4)
MALE MALE MALE MALE
AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2
F E M A L E AB (1-r)/2 AABB (1-r) (1-r)/4 aABb (1-r) (1-r)/4 aABB r (1-r)/4 AABb r (1-r)/4
F E M A L E ab (1-r)/2 AaBb (1-r) (1-r)/4 aabb (1-r) (1-r)/4 aaBb r (1-r)/4 Aabb r (1-r)/4
F E M A L E aB r/2 AaBB (1-r) r/4 aabB (1-r) r/4 aaBB r r/4 AabB r r/4
F E M A L E Ab r/2 AABb (1-r) r/4 aAbb (1-r) r/4 aABb r r/4 AAbb r r/4
9
Example 1 in Genetics (5)
  • Four distinct phenotypes
  • AB, Ab, aB and ab.
  • A the dominant phenotype from (Aa, AA, aA).
  • a the recessive phenotype from aa.
  • B the dominant phenotype from (Bb, BB, bB).
  • b the recessive phenotype from bb.
  • AB 9 gametic combinations.
  • Ab 3 gametic combinations.
  • aB 3 gametic combinations.
  • ab 1 gametic combination.
  • Total 16 combinations.

10
Example 1 in Genetics (6)
  • Let , then

11
Example 1 in Genetics (7)
  • Hence, the random sample of n from the offspring
    of selfed heterozygotes will follow a multinomial
    distribution
  • We know that
  • and
  • So

12
Example 1 in Genetics (8)
  • Suppose that we observe the data of
  • which is a random sample from
  • Then the probability mass function is

13
Maximum Likelihood Estimate (MLE)
  • Likelihood
  • Maximize likelihood Solve the score equations,
    which are setting the first derivates of
    likelihood to be zeros.
  • Under regular conditions, the MLE is consistent,
    asymptotic efficient and normal!
  • More
  • http//en.wikipedia.org/wiki/Maximum_likelihood

14
MLE for Example 1 (1)
  • Likelihood
  • MLE

15
MLE for Example 1 (2)
A
B
C
16
MLE for Example 1 (3)
  • Checking
  • 1.
  • 2.
  • 3. Compare ?

17
Part 3Numerical Solutions for the Score
Equations of MLEs
18
A Banach Space
  • A Banach space is a vector space over the
    field such that
  • Every Cauchy sequence of converges in
    (i.e., is complete).
  • More
  • http//en.wikipedia.org/wiki/Banach_space

19
Lipschitz Continuous
  • A closed subset and mapping
  • is Lipschitz continuous on with
  • if
  • is a contraction mapping on if is
    Lipschitz continuous and
  • More
  • http//en.wikipedia.org/wiki/Lipschitz_continuous

20
Fixed Point Theorem (1)
  • If is a contraction mapping on if is
    Lipschitz continuous and
  • has an unique fixed point
  • such that
  • initial

21
Fixed Point Theorem (2)
  • More
  • http//en.wikipedia.org/wiki/Fixed-point_theorem
  • http//www.math-linux.com/spip.php?article60

22
Applications for MLE (1)
  • Numerical solution
  • is a contraction mapping
  • initial value that
  • Then

23
Applications for MLE (2)
  • How to choose s.t. is a contraction
    mapping?
  • Optimal ?

24
Parallel Chord Method (1)
  • Parallel chord method is also called simple
    iteration.

25
Parallel Chord Method (2)
26
Plot Parallel Chord Method by R (1)
  • Simple iteration
  • y1 125 y2 18 y3 20 y4 24
  • First and second derivatives of log likelihood
  • f1 lt- function(phi) y1/(2phi)-(y2y3)/(1-phi)y4
    /phi
  • f2 lt- function(phi) (-1)y1(2phi)(-2)-(y2y3)
    (1-phi)(-2)-y4(phi)(-2)
  • x c(1080)0.01
  • y f1(x)
  • plot(x, y, type 'l', main "Parallel chord
    method", xlab expression(varphi), ylab "First
    derivative of log likelihood function")
  • abline(h 0)

27
Plot Parallel Chord Method by R (2)
  • phi0 0.25 Given the initial value 0.25
  • segments(0, f1(phi0), phi0, f1(phi0), col
    "green", lty 2)
  • segments(phi0, f1(phi0), phi0, -200, col
    "green", lty 2)
  • Use the tangent line to find the intercept b0
  • b0 f1(phi0)-f2(phi0)phi0
  • curve(f2(phi0)xb0, add T, col "red")
  • phi1 -b0/f2(phi0) Find the closer phi
  • segments(phi1, -200, phi1, f1(phi1), col
    "green", lty 2)
  • segments(0, f1(phi1), phi1, f1(phi1), col
    "green", lty 2)
  • Use the parallel line to find the intercept b1
  • b1 f1(phi1)-f2(phi0)phi1
  • curve(f2(phi0)xb1, add T, col "red")

28
Define Functions for Example 1 in R
  • We will define some functions and variables for
    finding the MLE in Example 1 by R

Fist, second and third derivatives of log
likelihood f1 function(y1, y2, y3, y4,
phi)y1/(2phi)-(y2y3)/(1-phi)y4/phi f2
function(y1, y2, y3, y4, phi) (-1)y1(2phi)(-2
)-(y2y3)(1-phi)(-2)-y4(phi)(-2) f3
function(y1, y2, y3, y4, phi) 2y1(2phi)(-3)-2
(y2y3)(1-phi)(-3)2y4(phi)(-3) Fisher
Information I function(y1, y2, y3, y4, phi)
(-1)(y1y2y3y4)(1/(4(2phi))1/(2(1-phi))1
/(4phi)) y1 125 y2 18 y3 20 y4 24
initial 0.9
29
Parallel Chord Method by R (1)
  • gt fix(SimpleIteration)
  • function(y1, y2, y3, y4, initial)
  • phi NULL
  • i 0
  • alpha -1.0/f2(y1, y2, y3, y4, initial)
  • phi2 initial
  • phi1 initial1
  • while(abs(phi1-phi2) gt 1.0E-5)
  • i i1
  • phi1 phi2
  • phi2 alphaf1(y1, y2, y3, y4, phi1)phi1
  • phii phi2
  • print("By parallel chord method(simple
    iteration)")
  • return(list(phi phi2, iteration phi))

30
Parallel Chord Method by R (2)
  • gt SimpleIteration(y1, y2, y3, y4, initial)

31
Parallel Chord Method by C/C
32
Newton-Raphson Method (1)
  • http//math.fullerton.edu/mathews/n2003/Newton'sMe
    thodMod.html
  • http//en.wikipedia.org/wiki/Newton27s_method

33
Newton-Raphson Method (2)
34
Plot Newton-Raphson Method by R (1)
  • Newton-Raphson Method
  • y1 125 y2 18 y3 20 y4 24
  • First and second derivatives of log likelihood
  • f1 lt- function(phi) y1/(2phi)-(y2y3)/(1-phi)y4
    /phi
  • f2 lt- function(phi) (-1)y1(2phi)(-2)-(y2y3)
    (1-phi)(-2)-y4(phi)(-2)
  • x c(1080)0.01
  • y f1(x)
  • plot(x, y, type 'l', main "Newton-Raphson
    method", xlab expression(varphi), ylab "First
    derivative of log likelihood function")
  • abline(h 0)

35
Plot Newton-Raphson Method by R (2)
  • Given the initial value 0.25
  • phi0 0.25
  • segments(0, f1(phi0), phi0, f1(phi0), col
    "green", lty 2)
  • segments(phi0, f1(phi0), phi0, -200, col
    "green", lty 2)
  • Use the tangent line to find the intercept b0
  • b0 f1(phi0)-f2(phi0)phi0
  • curve(f2(phi0)xb0, add T, col "purple", lwd
    2)
  • Find the closer phi
  • phi1 -b0/f2(phi0)
  • segments(phi1, -200, phi1, f1(phi1), col
    "green", lty 2)
  • segments(0, f1(phi1), phi1, f1(phi1), col
    "green", lty 2)

36
Plot Newton-Raphson Method by R (3)
  • Use the parallel line to find the intercept b1
  • b1 f1(phi1)-f2(phi0)phi1
  • curve(f2(phi0)xb1, add T, col "red")
  • curve(f2(phi1)x-f2(phi1)phi1f1(phi1), add T,
    col "blue", lwd 2)
  • legend(0.45, 250, c("Newton-Raphson", "Parallel
    chord method"), col c("blue", "red"), lty
    c(1, 1))

37
Newton-Raphson Method by R (1)
  • gt fix(Newton)
  • function(y1, y2, y3, y4, initial)
  • i 0
  • phi NULL
  • phi2 initial
  • phi1 initial1
  • while(abs(phi1-phi2) gt 1.0E-6)
  • i i1
  • phi1 phi2
  • alpha 1.0/(f2(y1, y2, y3, y4, phi1))
  • phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
    y3, y4, phi1)
  • phii phi2
  • print("By Newton-Raphson method")
  • return (list(phi phi2, iteration phi))

38
Newton-Raphson Method by R (2)
  • gt Newton(125, 18, 20, 24, 0.9)
  • 1 "By Newton-Raphson method"
  • phi
  • 1 0.577876
  • iteration
  • 1 0.8193054 0.7068499 0.6092827 0.5792747
    0.5778784 0.5778760 0.5778760

39
Newton-Raphson Method by C/C
40
Halleys Method
  • The Newton-Raphson iteration function is
  • It is possible to speed up convergence by using
    more expansion terms than the Newton-Raphson
    method does when the object function is very
    smooth, like the method by Edmond Halley
    (1656-1742)

(http//math.fullerton.edu/mathews/n2003/Halley'sM
ethodMod.html)
41
Halleys Method by R (1)
  • gt fix(Halley)
  • function( y1, y2, y3, y4, initial)
  • i 0
  • phi NULL
  • phi2 initial
  • phi1 initial1
  • while(abs(phi1-phi2) gt 1.0E-6)
  • i i1
  • phi1 phi2
  • alpha 1.0/(f2(y1, y2, y3, y4, phi1))
  • phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
    y3, y4, phi1)1.0/(1.0-f1(y1, y2, y3, y4,
    phi1)f3(y1, y2, y3, y4, phi1)/(f2(y1, y2, y3,
    y4, phi1)f2(y1, y2, y3, y4, phi1)2.0))
  • phii phi2
  • print("By Halley method")
  • return (list(phi phi2, iteration phi))

42
Halleys Method by R (2)
  • gt Halley(125, 18, 20, 24, 0.9)
  • 1 "By Halley method"
  • phi
  • 1 0.577876
  • iteration
  • 1 0.5028639 0.5793692 0.5778760 0.5778760

43
Halleys Method by C/C
44
Bisection Method (1)
  • Assume that  and that there exists a
    number such that . If 
    and have opposite signs, and represents the
    sequence of midpoints generated by the bisection
    process, thenand the sequence converges
    to .
  • That is,  .
  • http//en.wikipedia.org/wiki/Bisection_method

45
Bisection Method (2)
1
46
Plot the Bisection Method by R
  • Bisection method
  • y1 125 y2 18 y3 20 y4 24
  • f lt- function(phi) y1/(2phi)-(y2y3)/(1-phi)y4/
    phi
  • x c(1100)0.01
  • y f(x)
  • plot(x, y, type 'l', main "Bisection method",
    xlab expression(varphi), ylab "First
    derivative of log likelihood function")
  • abline(h 0)
  • abline(v 0.5, col "red")
  • abline(v 0.75, col "red")
  • text(0.49, 2200, labels "1")
  • text(0.74, 2200, labels "2")

47
Bisection Method by R (1)
  • gt fix(Bisection)
  • function(y1, y2, y3, y4, A, B) A, B is the
    boundary of parameter
  • Delta 1.0E-6 Tolerance for width of
    interval
  • Satisfied 0 Condition for loop termination
  • phi NULL
  • YA f1(y1, y2, y3, y4, A) Compute function
    values
  • YB f1(y1, y2, y3, y4, B)
  • Calculation of the maximum number of
    iterations
  • Max as.integer(1floor((log(B-A)-log(Delta))/lo
    g(2)))
  • Check to see if the bisection method applies
  • if(((YA gt 0) (YB gt0)) ((YA lt 0) (YB lt
    0)))
  • print("The values of function in boundary point
    do not differ in sign.")

48
Bisection Method by R (2)
  • print("Therefore, this method is not
    appropriate here.")
  • quit() Exit program
  • for(K in 1Max)
  • if(Satisfied 1)
  • break
  • C (AB)/2 Midpoint of interval
  • YC f1(y1, y2, y3, y4, C) Function value at
    midpoint
  • if((K-1) lt 100)
  • phiK-1 C
  • if(YC 0)
  • A C Exact root is found
  • B C
  • else
  • if((YBYC) gt 0 )

49
Bisection Method by R (3)
  • B C Squeeze from the right
  • YB YC
  • else
  • A C Squeeze from the left
  • YA YC
  • if((B-A) lt Delta)
  • Satisfied 1 Check for early convergence
  • End of 'for'-loop
  • print("By Bisection Method")
  • return(list(phi C, iteration phi))

50
Bisection Method by R (4)
  • gt Bisection(125, 18, 20, 24, 0.25, 1)
  • 1 "By Bisection Method"
  • phi
  • 1 0.5778754
  • iteration
  • 1 0.4375000 0.5312500 0.5781250 0.5546875
    0.5664062 0.5722656 0.5751953
  • 8 0.5766602 0.5773926 0.5777588 0.5779419
    0.5778503 0.5778961 0.5778732
  • 15 0.5778847 0.5778790 0.5778761 0.5778747
    0.5778754

51
Bisection Method by C/C (1)
52
Bisection Method by C/C (2)
53
Secant Method
  • More http//en.wikipedia.org/wiki/Secant_method
    http//math.fullerton.edu/mathews/n2003/SecantMeth
    odMod.html

54
Secant Method by R (1)
  • gt fix(Secant)
  • function(y1, y2, y3, y4, initial1, initial2)
  • phi NULL
  • phi2 initial1
  • phi1 initial2
  • alpha (phi2-phi1)/(f1(y1, y2, y3, y4,
    phi2)-f1(y1, y2, y3, y4, phi1))
  • phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
    y3, y4, phi1)
  • i 0
  • while(abs(phi1-phi2) gt 1.0E-6)
  • i i1
  • phi1 phi2

55
Secant Method by R (2)
  • alpha (phi2-phi1)/(f1(y1, y2, y3, y4,
    phi2)-f1(y1, y2, y3, y4, phi1))
  • phi2 phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2,
    y3, y4, phi1)
  • phii phi2
  • print("By Secant method")
  • return (list(phiphi2,iterationphi))

56
Secant Method by R (3)
  • gt Secant(125, 18, 20, 24, 0.9, 0.05)
  • 1 "By Secant method"
  • phi
  • 1 0.577876
  • iteration
  • 1 0.2075634 0.4008018 0.5760396 0.5778801
    0.5778760 0.5778760

57
Secant Method by C/C
58
Secant-Bracket Method
  • The secant-bracket method is also called the
    regular falsi method.

59
Secant-Bracket Method by R (1)
  • gt fix(RegularFalsi)
  • function(y1, y2, y3, y4, A, B)
  • phi NULL
  • i -1
  • Delta 1.0E-6 Tolerance for width of
    interval
  • Satisfied 1 Condition for loop termination
  • Endpoints of the interval A,B
  • YA f1(y1, y2, y3, y4, A) compute function
    values
  • YB f1(y1, y2, y3, y4, B)
  • Check to see if the bisection method applies
  • if(((YA gt 0) (YB gt0)) ((YA lt 0) (YB lt
    0)))
  • print("The values of function in boundary point
    do not differ in sign")
  • print("Therefore, this method is not
    appropriate here")
  • q() Exit program

60
Secant-Bracket Method by R (2)
  • while(Satisfied)
  • i i1
  • C (Bf1(y1, y2, y3, y4, A)-Af1(y1, y2, y3,
    y4, B))/(f1(y1, y2, y3, y4, A)-f1(y1, y2, y3, y4,
    B)) Midpoint of interval
  • YC f1(y1, y2, y3, y4, C) Function value at
    midpoint
  • phii C
  • if(YC 0) First 'if'
  • A C Exact root is found
  • B C
  • else
  • if((YBYC) gt 0 )
  • B C Squeeze from the right
  • YB YC

61
Secant-Bracket Method by R (3)
  • else
  • A C Squeeze from the left
  • YA YC
  • if(f1(y1, y2, y3, y4, C) lt Delta)
  • Satisfied 0 Check for early convergence
  • print("By Regular Falsi Method")
  • return(list(phi C, iteration phi))

62
Secant-Bracket Method by R (4)
  • gt RegularFalsi(y1, y2, y3, y4, 0.9, 0.05)
  • 1 "By Regular Falsi Method"
  • phi
  • 1 0.577876
  • iteration
  • 1 0.5758648 0.5765007 0.5769352 0.5772324
    0.5774356 0.5775746 0.5776698
  • 8 0.5777348 0.5777794 0.5778099 0.5778307
    0.5778450 0.5778548 0.5778615
  • 15 0.5778660 0.5778692 0.5778713 0.5778728
    0.5778738 0.5778745 0.5778750
  • 22 0.5778753 0.5778755 0.5778756 0.5778757
    0.5778758 0.5778759 0.5778759
  • 29 0.5778759 0.5778759 0.5778759 0.5778760
    0.5778760 0.5778760 0.5778760
  • 36 0.5778760 0.5778760

63
Secant-Bracket Method by C/C (1)
64
Secant-Bracket Method by C/C (2)
65
Fisher Scoring Method
  • Fisher scoring method replaces
    by where is
    the Fisher information matrix when the parameter
    may be multivariate.

66
Fisher Scoring Method by R (1)
  • gt fix(Fisher)
  • function(y1, y2, y3, y4, initial)
  • i 0
  • phi NULL
  • phi2 initial
  • phi1 initial1
  • while(abs(phi1-phi2) gt 1.0E-6)
  • i i1
  • phi1 phi2
  • alpha 1.0/I(y1, y2, y3, y4, phi1)
  • phi2 phi1-f1(y1, y2, y3, y4, phi1)/I(y1, y2,
    y3, y4, phi1)
  • phii phi2
  • print("By Fisher method")
  • return(list(phi phi2, iteration phi))

67
Fisher Scoring Method by R (2)
  • gt Fisher(125, 18, 20, 24, 0.9)
  • 1 "By Fisher method"
  • phi
  • 1 0.577876
  • iteration
  • 1 0.5907181 0.5785331 0.5779100 0.5778777
    0.5778761 0.5778760

68
Fisher Scoring Method by C/C
69
Order of Convergence
  • Order of convergence is if
  • and for .
  • More http//en.wikipedia.org/wiki/Order_of_conver
    gence
  • Note as
  • Hence, we can use regression to estimate

70
Theorem for Newton-Raphson Method
  • If , is a
    contraction mapping then
    and
  • If exists,
    has a simple zero, then
    such that of the Newton-Raphson method
    is a contraction mapping and .

71
Find Convergence Order by R (1)
  • gt Coverage order
  • gt Newton method can be substitute for different
    method
  • gt R Newton(y1, y2, y3, y4, initial)
  • 1 "By Newton-Raphson method"
  • gt temp log(abs(Riteration-Rphi))
  • gt y temp2(length(temp)-1)
  • gt x temp1(length(temp)-2)
  • gt lm(yx)
  • Call
  • lm(formula y x)
  • Coefficients
  • (Intercept) x
  • 0.6727 2.0441

72
Find Convergence Order by R (2)
gtR SimpleIteration(y1, y2, y3, y4, initial) 1
"By parallel chord method(simple iteration)" gt
temp log(abs(Riteration-Rphi)) gt y
temp2(length(temp)-1) gt x temp1(length(temp
)-2) gt lm(yx) Call lm(formula y
x) Coefficients (Intercept) x
-0.01651 1.01809
gt R Secant(y1, y2, y3, y4, initial, 0.05) 1
"By Secant method" gt temp log(abs(Riteration-R
phi)) gt y temp2(length(temp)-1) gt x
temp1(length(temp)-2) gt lm(yx) Call lm(formul
a y x) Coefficients (Intercept)
x -1.245 1.869
gt R Fisher(y1, y2, y3, y4, initial) 1 "By
Fisher method" gt templog(abs(Riteration-Rphi))
gt ytemp2(length(temp)-1) gt xtemp1(length(te
mp)-2) gt lm(yx) Call lm(formula y
x) Coefficients (Intercept) x
-2.942 1.004
gt R Bisection(y1, y2, y3, y4, 0.25, 1) 1 "By
Bisection Method" gt templog(abs(Riteration-Rphi
)) gt ytemp2(length(temp)-1) gt
xtemp1(length(temp)-2) gt lm(yx) Call lm(form
ula y x) Coefficients (Intercept)
x -1.9803 0.8448
73
Find Convergence Order by R (3)
  • gt R RegularFalsi(y1, y2, y3, y4, initial, 0.05)
  • 1 "By Regular Falsi Method"
  • gt temp log(abs(Riteration-Rphi))
  • gt y temp2(length(temp)-1)
  • gt x temp1(length(temp)-2)
  • gt lm(yx)
  • Call
  • lm(formula y x)
  • Coefficients
  • (Intercept) x
  • -0.2415 1.0134

74
Find Convergence Order by C/C
75
Exercises
  • Write your own programs for those examples
    presented in this talk.
  • Write programs for those examples mentioned at
    the following web page
  • http//en.wikipedia.org/wiki/Maximum_likelihood
  • Write programs for the other examples that you
    know.

76
More Exercises (1)
  • Example 3 in genetics
  • The observed data are
  • where , , and fall in
  • such that
  • Find the MLEs for , , and .

77
More Exercises (2)
  • Example 4 in the positron emission tomography
    (PET)
  • The observed data are
  • and
  • The values of are known and the unknown
    parameters are .
  • Find the MLEs for .

78
More Exercises (3)
  • Example 5 in the normal mixture
  • The observed data are random
    samples from the following probability density
    function
  • Find the MLEs for the following parameters
Write a Comment
User Comments (0)
About PowerShow.com