Maximum Likelihood Estimation - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Maximum Likelihood Estimation

Description:

... that is likewise constrained (logit, probit, comp loglog) ... In JMP: Identity, logit, probit, log, reciprocal, power, comp loglog. Poisson distribution for ... – PowerPoint PPT presentation

Number of Views:294

Avg rating:3.0/5.0

Slides: 28

Provided by: michae1249

Category:

more less

Transcript and Presenter's Notes

Title: Maximum Likelihood Estimation

1
Maximum Likelihood Estimation
2
Maximum Likelihood Estimation(MLE)

Most statistical methods are designed to minimize
error.
Choose the parameter values that minimizes
predictive error y - y or (y - y)2
Maximum likelihood estimation seeks the parameter
values that are most likely to have produced the
observed distribution.

3
Likelihood and PDFs

For a continuous variable, the likelihood of a
particular value is obtained from the PDF
(probability density function).

Gamma
Normal
4
Likelihood ? Probability (for continuous
distributions)
P(x2)0
P(x2)red AUC
5
Maximum likelihood estimates of parameters

For MLE, the goal is to determine the most likely
values of the population parameter value (e.g, µ,
?, ?, ?, ) given an observed sample value
(e.g., x-bar, s, b, r, .)
Any models parameters (e.g., ? in linear
regression, a, b, c, etc. in nonlinear models,
weights in backprop) can be estimated using MLE.

6
Likelihood is based on shape of the d.v.s
distribution!!!

ANOVA, Pearsons r, t-test, regression all
assume that d.v. is normally distributed.
Under those conditions, the LSE (least squares
estimate) is the MLE.
If the d.v. is not normally distributed, the LSE
is not the MLE.
So, first step is to determine the shape of the
distribution of your d.v.

7
Step 1 Identify the distribution

Normal, lognormal, beta, gamma, binomial,
multinomial, Weibull, Poisson, exponential.
AAAHHH!
Precision isnt critical unless the sample size
is huge.
Most stats package can fit a dv distribution
using various distribution classes.
In JMP, do a distribution analysis and then try
various distribution fits
Note, 0 and negative values are illegal for some
fits.

8
Step 2 Choose analysis

If only looking at linear models and fixed
effects, use GLM.
GLM allows you to specify the dv distribution
type.
(Youll know you have an MLE method if the output
includes likelihoods).
Random effects will be considered later.
Otherwise, you need to modify your fitting method
to use a different loss function.

9
GLM Distributions and Link Functions

The distribution specifies the nature of the
error distribution.
The link function provides the relationship
between the linear predictor and the mean of the
distribution function E(Y) g(Y)
Most often, the distribution determines the best
link function (aka canonical link functions).
For example, a Y distribution that ranges between
0 and 1 (binomial) would necessitate a link
function that is likewise constrained (logit,
probit, comp loglog).

10
Distributions and Link Functions

Distributions
In JMP Normal, binomial (0/1), Poisson (count
data), exponential (positive continuous)
Link functions
In JMP Identity, logit, probit, log, reciprocal,
power, comp loglog

11
Poisson distribution for count data (Courtesy
of Wikipedia)
12
http//www.mathworks.com/products/demos/statistics
/glmdemo.html
13
Canonical Link Functions
14
Step 3 Loss functions

LSE uses (y - y)2 as the loss function and tries
to minimize the sum of this quantity (across
rows) ? SSE.
MLE loss functions depend on the assumed
distribution of the d.v.

15
MLE Loss functions

Likelihood function is for the joint probability
of all of the data.
For example, P(µ2) for row 1 and P(µ2) for row
2 and P(µ2) for row 3
Which equals
Its mathematically easier to deal with sums, so
well take the log of that quantity

16
MLE Loss functions, cont.

Now, we have something that can be computed for
each row and summed
but, we want the maximum of that last equation
whereas loss functions should be minimized.
Easy! Well just take negate it.
Negative log likelihood becomes our loss
function.

17
So, once you know the PDF

take the log of the function and negate it.
This doesnt change the point of the
maximum/minimum of the PDF

18
PDF
Log(PDF)
-Log(PDF)
19
Step 4 Find Parameter Values that Maximize
Likelihood (i.e., that minimize LL)

No general solutions, so iterative methods again
are used.

20
Step 5 Model comparison in MLE

Model choice using AIC/BIC
AIC -2LL 2k BIC -2LL kln(n)
Computation of likelihood ratio (LR)
L(model 1) / L(model 2)
USE LR ONLY FOR NESTED MODELS!

Model 1
LR1
LR2
LR.2
Model 2
21
Repeated Measures MLE

What if you have repeated measures data and want
to do MLE?
Need to be able to specify fixed (typically your
IVs) and random (subject) effects.
Solutions
GLM with fixed and random effects (SPSS)
GLM in SPSS does support random effects but
estimates their parameters as if they were fixed.
GLM cant handle missing data without deleting
rows.
GLM requires equal number of observations for
each subject.
GLM requires sphericity assumption met, etc.
Mixed effects modeling
Linear mixed effects
Nonlinear mixed effects

22
Mixed Effects Modeling

Some classic methods are special cases of mixed
effects modeling
HLM (hierarchical linear modeling) lme
Growth curve analysis lme, lme with transformed
y, nlme
Some growth curve analysis is different
Multilevel model lme
Random coefficient models lme

23
What is it?

Uses MLE
Includes fixed and random effects (hence mixed)
Can estimate parameters at different levels
(hence multilevel modeling)
Group parameter estimate is a function of
Individual parameter estimates and vice versa
Ditto for each higher level of group parameters
(e.g., state/county/school/grade/class/subject)
Allows estimation of variance-covariance
structure
Repeated measures ANOVA assumes compound symmetry
MANOVA assumes unstructured matrix
Mixed effects allows you to fit various types of
structures
From unstructured to identity (homogeneity of
variance)

24
Advantages of mixed effects modeling

Uses all available data
When data is missing, estimates are simply less
certain (recall MLE!)
Allows testing of various variance-covariance
structures - thus, not too conservative or too
liberal.
Can allow parameter values to vary or be fixed at
each level
Fits individual subjects, not average subject
Average subject can be an artifact (esp. for
nonlinear fits)
Example learning curves - average sensitivities
vs. sensitivity of average.

25
Challenges

Linear mixed effects modeling is becoming easy to
do.
JMP Choose REML from pull-down (option only
available if some factors designated as random)
Challenge for researchers is interpretation when
some variables are continuous.
Fitting nonlinear mixed effects models is tricky
More local minima
Initial value problem exacerbated due to increase
in parameters (e.g., A and B for each subject and
each group)
Start with simplest models (e.g., not allowing
all parameters to vary, homogeneity of variance)
and then relax assumptions.
NLME not available in most stats packages.
Complex models (linear and nonlinear)
time-consuming to fit