Title: Maximum Likelihood Estimation
1Maximum Likelihood Estimation
2Maximum Likelihood Estimation(MLE)
- Most statistical methods are designed to minimize
error. - Choose the parameter values that minimizes
predictive error y - y or (y - y)2 - Maximum likelihood estimation seeks the parameter
values that are most likely to have produced the
observed distribution.
3Likelihood and PDFs
- For a continuous variable, the likelihood of a
particular value is obtained from the PDF
(probability density function).
Gamma
Normal
4Likelihood ? Probability (for continuous
distributions)
P(x2)0
P(x2)red AUC
5Maximum likelihood estimates of parameters
- For MLE, the goal is to determine the most likely
values of the population parameter value (e.g, µ,
?, ?, ?, ) given an observed sample value
(e.g., x-bar, s, b, r, .) - Any models parameters (e.g., ? in linear
regression, a, b, c, etc. in nonlinear models,
weights in backprop) can be estimated using MLE.
6Likelihood is based on shape of the d.v.s
distribution!!!
- ANOVA, Pearsons r, t-test, regression all
assume that d.v. is normally distributed. - Under those conditions, the LSE (least squares
estimate) is the MLE. - If the d.v. is not normally distributed, the LSE
is not the MLE. - So, first step is to determine the shape of the
distribution of your d.v.
7Step 1 Identify the distribution
- Normal, lognormal, beta, gamma, binomial,
multinomial, Weibull, Poisson, exponential. - AAAHHH!
- Precision isnt critical unless the sample size
is huge. - Most stats package can fit a dv distribution
using various distribution classes. - In JMP, do a distribution analysis and then try
various distribution fits - Note, 0 and negative values are illegal for some
fits.
8Step 2 Choose analysis
- If only looking at linear models and fixed
effects, use GLM. - GLM allows you to specify the dv distribution
type. - (Youll know you have an MLE method if the output
includes likelihoods). - Random effects will be considered later.
- Otherwise, you need to modify your fitting method
to use a different loss function.
9GLM Distributions and Link Functions
- The distribution specifies the nature of the
error distribution. - The link function provides the relationship
between the linear predictor and the mean of the
distribution function E(Y) g(Y) - Most often, the distribution determines the best
link function (aka canonical link functions). - For example, a Y distribution that ranges between
0 and 1 (binomial) would necessitate a link
function that is likewise constrained (logit,
probit, comp loglog).
10Distributions and Link Functions
- Distributions
- In JMP Normal, binomial (0/1), Poisson (count
data), exponential (positive continuous) - Link functions
- In JMP Identity, logit, probit, log, reciprocal,
power, comp loglog
11Poisson distribution for count data (Courtesy
of Wikipedia)
12http//www.mathworks.com/products/demos/statistics
/glmdemo.html
13Canonical Link Functions
14Step 3 Loss functions
- LSE uses (y - y)2 as the loss function and tries
to minimize the sum of this quantity (across
rows) ? SSE. - MLE loss functions depend on the assumed
distribution of the d.v.
15MLE Loss functions
- Likelihood function is for the joint probability
of all of the data. - For example, P(µ2) for row 1 and P(µ2) for row
2 and P(µ2) for row 3 - Which equals
- Its mathematically easier to deal with sums, so
well take the log of that quantity
16MLE Loss functions, cont.
- Now, we have something that can be computed for
each row and summed - but, we want the maximum of that last equation
whereas loss functions should be minimized. - Easy! Well just take negate it.
- Negative log likelihood becomes our loss
function.
17So, once you know the PDF
- take the log of the function and negate it.
- This doesnt change the point of the
maximum/minimum of the PDF
18PDF
Log(PDF)
-Log(PDF)
19Step 4 Find Parameter Values that Maximize
Likelihood (i.e., that minimize LL)
- No general solutions, so iterative methods again
are used.
20Step 5 Model comparison in MLE
- Model choice using AIC/BIC
- AIC -2LL 2k BIC -2LL kln(n)
- Computation of likelihood ratio (LR)
- L(model 1) / L(model 2)
- USE LR ONLY FOR NESTED MODELS!
Model 1
LR1
LR2
LR.2
Model 2
21Repeated Measures MLE
- What if you have repeated measures data and want
to do MLE? - Need to be able to specify fixed (typically your
IVs) and random (subject) effects. - Solutions
- GLM with fixed and random effects (SPSS)
- GLM in SPSS does support random effects but
estimates their parameters as if they were fixed. - GLM cant handle missing data without deleting
rows. - GLM requires equal number of observations for
each subject. - GLM requires sphericity assumption met, etc.
- Mixed effects modeling
- Linear mixed effects
- Nonlinear mixed effects
22Mixed Effects Modeling
- Some classic methods are special cases of mixed
effects modeling - HLM (hierarchical linear modeling) lme
- Growth curve analysis lme, lme with transformed
y, nlme - Some growth curve analysis is different
- Multilevel model lme
- Random coefficient models lme
23What is it?
- Uses MLE
- Includes fixed and random effects (hence mixed)
- Can estimate parameters at different levels
(hence multilevel modeling) - Group parameter estimate is a function of
Individual parameter estimates and vice versa - Ditto for each higher level of group parameters
(e.g., state/county/school/grade/class/subject) - Allows estimation of variance-covariance
structure - Repeated measures ANOVA assumes compound symmetry
- MANOVA assumes unstructured matrix
- Mixed effects allows you to fit various types of
structures - From unstructured to identity (homogeneity of
variance)
24Advantages of mixed effects modeling
- Uses all available data
- When data is missing, estimates are simply less
certain (recall MLE!) - Allows testing of various variance-covariance
structures - thus, not too conservative or too
liberal. - Can allow parameter values to vary or be fixed at
each level - Fits individual subjects, not average subject
- Average subject can be an artifact (esp. for
nonlinear fits) - Example learning curves - average sensitivities
vs. sensitivity of average.
25Challenges
- Linear mixed effects modeling is becoming easy to
do. - JMP Choose REML from pull-down (option only
available if some factors designated as random) - Challenge for researchers is interpretation when
some variables are continuous. - Fitting nonlinear mixed effects models is tricky
- More local minima
- Initial value problem exacerbated due to increase
in parameters (e.g., A and B for each subject and
each group) - Start with simplest models (e.g., not allowing
all parameters to vary, homogeneity of variance)
and then relax assumptions. - NLME not available in most stats packages.
- Complex models (linear and nonlinear)
time-consuming to fit
26MLE The Future
27(No Transcript)