Title: Likelihood Methods in Ecology
1Likelihood Methods in Ecology
- November 16th 20th, 2009
- Millbrook, NY
- Instructors
- Charles Canham and MarÃa Uriarte
- Teaching Assistant
- Liza Comita
2Daily Schedule
- Morning
- 830 920 Lecture
- 920 1010 Case Study or Discussion
- 1030 1200 Lab
- Lunch 1200 130 (in this room)
- Afternoon
- 130 220 Lecture
- 220 310 Lab
- 330 500 Lab
3Course OutlineStatistical Inference using
Likelihood
- Principles and practice of maximum likelihood
estimation - Know your data choosing appropriate likelihood
functions - Formulate statistical models as alternate
hypotheses - Find the ML estimates of the parameters of your
models - Compare alternate models and choose the most
parsimonious - Evaluate individual models
- Advanced topics
Likelihood is much more than a statistical
method... (it can completely change the way you
ask and answer questions)
4Lecture 1An Introduction to Likelihood Estimation
- Probability and probability density functions
- Maximum likelihood estimates (versus traditional
method of moment estimates) - Statistical inference
- Classical frequentist statistics Limitations
and mental gyrations... - The likelihood alternative Basic principles
and definitions - Model comparison as a generalization of
hypothesis testing
5A simple definition of probability for discrete
events...
...the ratio of the number of events of type A
to the total number of all possible events
(outcomes)... The enumeration of all possible
outcomes is called the sample space (S). Â Â Â If
there are n possible outcomes in a sample space,
S, and m of those are favorable for event A, then
the probability of event, A is given as   Â
PA m/n
6Probability defined more generally...
- Consider an outcome X from some process that has
a set of possible outcomes S - If X and S are discrete, then PX X/S
- If X is continuous, then the probability has to
be defined in the limit
Where g(x) is a probability density function (PDF)
7The Normal Probability Density Function (PDF)
m mean s2 variance
- Properties of a PDF
- (1) 0 lt g(x) lt 1
- (2) ? g(x) 1
8Common PDFs...
- For continuous data
- Normal
- Lognormal
- Gamma
- For discrete data
- Poisson
- Binomial
- Multinomial
- Negative Binomial
See McLaughlin (1993) A compendium of common
probability distributions in the reading list
9Why are PDFs important?
Answer because they are used to calculate
likelihood (And in that case, they are called
likelihood functions)
10Statistical Estimators
A statistical estimator is a function applied to
a sample of data used to estimate an unknown
population parameter (and an estimate is just
the result of applying an estimator to a sample)
11Properties of Estimators
- Some desirable properties of point estimators
(functions to estimate a fixed parameter) - Bias if the average error is zero, the estimate
is unbiased - Efficiency an estimate with the minimum
variance is the most efficient (note the most
efficient estimator is often biased) - Consistency As sample size increases, the
probability of the estimate being close to the
parameter increases - Asymptotically normal a consistent estimator
whose distribution around the true parameter ?
approaches a normal distribution with standard
deviation shrinking in proportion to
as
the sample size n grows
12Maximum likelihood (ML) estimates versus
Method of moment (MOM) estimates
Bottom line MOM was born in the time before
computers, and was OK, ML needs computing
power, but has more desirable properties
13Doing it MOMs way Central Moments
14Whats wrong with MOMs way?
- Nothing, if all you are interested in is
calculating properties of your sample - But MOMs formulas are generally not the best
way1 to infer estimates of the statistical
properties of the population from which the
sample was drawn - For example Population variance
- (because the second central moment is a biased
underestimate of the population variance) - 1 in the formal terms of bias, efficiency,
consistency, and asymptotic normality
15The Maximum Likelihood alternative
Going back to PDFs in plain language, a PDF
allows you to calculate the probability that an
observation will take on a value (x), given the
underlying (true?) parameters of the population
16But theres a problem
- The PDF defines the probability of observing an
outcome (x), given that you already know the true
population parameter (?) - But we want to generate an estimate of ?, given
our data (x) - And, unfortunately, the two are not identical
17Fisher and the concept of Likelihood...
The Likelihood Principle
In plain English The likelihood (L) of the
parameter estimates (?), given a sample (x) is
proportional to the probability of observing the
data, given the parameters... and this
probability is something we can calculate, using
the appropriate underlying probability model
(i.e. a PDF)
18R.A. Fisher (1890- 1962)
http//www.economics.soton.ac.uk/staff/aldrich/fis
herguide/problik.htm Likelihood and Probability
in R. A. Fishers Statistical Methods for
Research Workers (John Aldrich) A good
summary of the evolution of Fishers ideas on
probability, likelihood, and inference Contains
links to PDFs of Fishers early papers A
second page shows the evolution of his ideas
through changes in successive editions of
Fishers books
Age 22
19Calculating Likelihood and Log-Likelihood for
Datasets
From basic probability theory If two events (A
and B) are independent, then P(A,B) P(A)P(B)
More generally, for i 1..n independent
observations, and a vector X of observations
(xi)
But, logarithms are easier to work with, so...
20Likelihood Surfaces
The variation in likelihood for any given set of
parameter values defines a likelihood
surface...
For a model with just 1 parameter, the surface is
simply a curve (aka a likelihood profile)
21Support and Support Limits
Log-likelihood Support (Edwards 1992)
22A (somewhat trivial) example
- MOM vs ML estimates of the probability of
survival for a population - Data a quadrat in which 16 of 20 seedlings
survived during a census interval. (Note that in
this case, the quadrat is the unit of
observation, so sample size 1)
i.e. Given N20, x 16, what is p?
x lt- seq(0,1,0.005) y lt- dbinom(16,20,x) plot(x,y)
xwhich.max(y)
23A more realistic example
Create some data (5 quadrats) N lt-
c(11,14,8,22,50) x lt- c(8,7,5,17,35) Calculate
the log-likelihood for each probability of
survival p lt- seq(0,1,0.005) log_likelihood lt-
rep(0,length(p)) for (i in 1length(p))
log_likelihoodi lt- sum(log(dbinom(x,N,pi)))
Plot the likelihood profile plot(p,log_likeli
hood) What probability of survival maximizes
log likelihood? pwhich.max(log_likelihood) 0.685
How does this compare to the average across
the 5 quadrats mean(x/N) 0.665
24Focus in on the MLE
what is the log-likelihood of the
MLE? max(log_likelihood) 1 -9.46812
- Things to note about log-likelihoods
- They should always be negative! (if not, you
have a problem with your likelihood function) - The absolute magnitude of the log-likelihood
increases as sample size increases
25An example with continuous data
The normal PDF
x observed m mean s2 variance
In R dnorm(x, mean 0, sd 1, log FALSE) gt
dnorm(2,2.5,1) 1 0.3520653 gt dnorm(2,2.5,1,logT
) 1 -1.043939 gt
Problem Now there are TWO unknowns needed to
calculate likelihood (the mean and the
variance)! Solution treat the variance just
like another parameter in the model, and find the
ML estimate of the variance just like you would
any other parameter (this is exactly what youll
do in the lab this morning)