Maximum Likelihood Estimation - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Maximum Likelihood Estimation

Description:

... that is likewise constrained (logit, probit, comp loglog) ... In JMP: Identity, logit, probit, log, reciprocal, power, comp loglog. Poisson distribution for ... – PowerPoint PPT presentation

Number of Views:294
Avg rating:3.0/5.0
Slides: 28
Provided by: michae1249
Category:

less

Transcript and Presenter's Notes

Title: Maximum Likelihood Estimation


1
Maximum Likelihood Estimation
2
Maximum Likelihood Estimation(MLE)
  • Most statistical methods are designed to minimize
    error.
  • Choose the parameter values that minimizes
    predictive error y - y or (y - y)2
  • Maximum likelihood estimation seeks the parameter
    values that are most likely to have produced the
    observed distribution.

3
Likelihood and PDFs
  • For a continuous variable, the likelihood of a
    particular value is obtained from the PDF
    (probability density function).

Gamma
Normal
4
Likelihood ? Probability (for continuous
distributions)
P(x2)0
P(x2)red AUC
5
Maximum likelihood estimates of parameters
  • For MLE, the goal is to determine the most likely
    values of the population parameter value (e.g, µ,
    ?, ?, ?, ) given an observed sample value
    (e.g., x-bar, s, b, r, .)
  • Any models parameters (e.g., ? in linear
    regression, a, b, c, etc. in nonlinear models,
    weights in backprop) can be estimated using MLE.

6
Likelihood is based on shape of the d.v.s
distribution!!!
  • ANOVA, Pearsons r, t-test, regression all
    assume that d.v. is normally distributed.
  • Under those conditions, the LSE (least squares
    estimate) is the MLE.
  • If the d.v. is not normally distributed, the LSE
    is not the MLE.
  • So, first step is to determine the shape of the
    distribution of your d.v.

7
Step 1 Identify the distribution
  • Normal, lognormal, beta, gamma, binomial,
    multinomial, Weibull, Poisson, exponential.
  • AAAHHH!
  • Precision isnt critical unless the sample size
    is huge.
  • Most stats package can fit a dv distribution
    using various distribution classes.
  • In JMP, do a distribution analysis and then try
    various distribution fits
  • Note, 0 and negative values are illegal for some
    fits.

8
Step 2 Choose analysis
  • If only looking at linear models and fixed
    effects, use GLM.
  • GLM allows you to specify the dv distribution
    type.
  • (Youll know you have an MLE method if the output
    includes likelihoods).
  • Random effects will be considered later.
  • Otherwise, you need to modify your fitting method
    to use a different loss function.

9
GLM Distributions and Link Functions
  • The distribution specifies the nature of the
    error distribution.
  • The link function provides the relationship
    between the linear predictor and the mean of the
    distribution function E(Y) g(Y)
  • Most often, the distribution determines the best
    link function (aka canonical link functions).
  • For example, a Y distribution that ranges between
    0 and 1 (binomial) would necessitate a link
    function that is likewise constrained (logit,
    probit, comp loglog).

10
Distributions and Link Functions
  • Distributions
  • In JMP Normal, binomial (0/1), Poisson (count
    data), exponential (positive continuous)
  • Link functions
  • In JMP Identity, logit, probit, log, reciprocal,
    power, comp loglog

11
Poisson distribution for count data (Courtesy
of Wikipedia)
12
http//www.mathworks.com/products/demos/statistics
/glmdemo.html
13
Canonical Link Functions
14
Step 3 Loss functions
  • LSE uses (y - y)2 as the loss function and tries
    to minimize the sum of this quantity (across
    rows) ? SSE.
  • MLE loss functions depend on the assumed
    distribution of the d.v.

15
MLE Loss functions
  • Likelihood function is for the joint probability
    of all of the data.
  • For example, P(µ2) for row 1 and P(µ2) for row
    2 and P(µ2) for row 3
  • Which equals
  • Its mathematically easier to deal with sums, so
    well take the log of that quantity

16
MLE Loss functions, cont.
  • Now, we have something that can be computed for
    each row and summed
  • but, we want the maximum of that last equation
    whereas loss functions should be minimized.
  • Easy! Well just take negate it.
  • Negative log likelihood becomes our loss
    function.

17
So, once you know the PDF
  • take the log of the function and negate it.
  • This doesnt change the point of the
    maximum/minimum of the PDF

18
PDF
Log(PDF)
-Log(PDF)
19
Step 4 Find Parameter Values that Maximize
Likelihood (i.e., that minimize LL)
  • No general solutions, so iterative methods again
    are used.

20
Step 5 Model comparison in MLE
  • Model choice using AIC/BIC
  • AIC -2LL 2k BIC -2LL kln(n)
  • Computation of likelihood ratio (LR)
  • L(model 1) / L(model 2)
  • USE LR ONLY FOR NESTED MODELS!

Model 1
LR1
LR2
LR.2
Model 2
21
Repeated Measures MLE
  • What if you have repeated measures data and want
    to do MLE?
  • Need to be able to specify fixed (typically your
    IVs) and random (subject) effects.
  • Solutions
  • GLM with fixed and random effects (SPSS)
  • GLM in SPSS does support random effects but
    estimates their parameters as if they were fixed.
  • GLM cant handle missing data without deleting
    rows.
  • GLM requires equal number of observations for
    each subject.
  • GLM requires sphericity assumption met, etc.
  • Mixed effects modeling
  • Linear mixed effects
  • Nonlinear mixed effects

22
Mixed Effects Modeling
  • Some classic methods are special cases of mixed
    effects modeling
  • HLM (hierarchical linear modeling) lme
  • Growth curve analysis lme, lme with transformed
    y, nlme
  • Some growth curve analysis is different
  • Multilevel model lme
  • Random coefficient models lme

23
What is it?
  • Uses MLE
  • Includes fixed and random effects (hence mixed)
  • Can estimate parameters at different levels
    (hence multilevel modeling)
  • Group parameter estimate is a function of
    Individual parameter estimates and vice versa
  • Ditto for each higher level of group parameters
    (e.g., state/county/school/grade/class/subject)
  • Allows estimation of variance-covariance
    structure
  • Repeated measures ANOVA assumes compound symmetry
  • MANOVA assumes unstructured matrix
  • Mixed effects allows you to fit various types of
    structures
  • From unstructured to identity (homogeneity of
    variance)

24
Advantages of mixed effects modeling
  • Uses all available data
  • When data is missing, estimates are simply less
    certain (recall MLE!)
  • Allows testing of various variance-covariance
    structures - thus, not too conservative or too
    liberal.
  • Can allow parameter values to vary or be fixed at
    each level
  • Fits individual subjects, not average subject
  • Average subject can be an artifact (esp. for
    nonlinear fits)
  • Example learning curves - average sensitivities
    vs. sensitivity of average.

25
Challenges
  • Linear mixed effects modeling is becoming easy to
    do.
  • JMP Choose REML from pull-down (option only
    available if some factors designated as random)
  • Challenge for researchers is interpretation when
    some variables are continuous.
  • Fitting nonlinear mixed effects models is tricky
  • More local minima
  • Initial value problem exacerbated due to increase
    in parameters (e.g., A and B for each subject and
    each group)
  • Start with simplest models (e.g., not allowing
    all parameters to vary, homogeneity of variance)
    and then relax assumptions.
  • NLME not available in most stats packages.
  • Complex models (linear and nonlinear)
    time-consuming to fit

26
MLE The Future
27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com