Estimating parameters from data

About This Presentation

Title:

Estimating parameters from data

Description:

How can I estimate model parameters from data? ... However it is a rubbish estimator. We also need to worry about the variance of an estimator ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 33

Provided by: mcv31

Category:

more less

Transcript and Presenter's Notes

Title: Estimating parameters from data

1
Estimating parameters from data

Gil McVean, Department of Statistics
Thursday 13th February 2009

2
Questions to ask

How can I estimate model parameters from data?
What should I worry about when choosing between
estimators?
Is there some optimal way of estimating
parameters from data?
How can I compare different parameter values?
How should I make statements about certainty
regarding estimates and hypotheses?

3
Motivating example I

I conduct an experiment where I measure the
weight of 100 mice that were exposed to a normal
diet and 50 mice exposed to a high-energy diet
I want to estimate the expected gain in weight
due to the change in diet

Normal
High-calorie
4
Motivating example II

I observe the co-segregation of two traits (e.g.
a visible trait and a genetic marker) in a cross
I want to estimate the recombination rate between
the two markers

5
Parameter estimation

We can formulate most questions in statistics in
terms of making statements about underlying
parameters
We want to devise a framework for estimating
those parameters and making statements about our
certainty
In this lecture we will look at several different
approaches to making such statements
Moment estimators
Likelihood
Bayesian estimation

6
Moment estimation

You have already come across one way of
estimating parameter values moment methods
In such techniques parameter values are found
that match sample moments (mean, variance, etc.)
to those expected
E.g. for random variables X1, X2, etc. sampled
from a N(m,s2) distribution

7
Example fitting a gamma distribution

The gamma distribution is parameterised by a
shape parameter, a, and a scale parameter, b
The mean of the distribution is a/b and the
variance is a/b2
We can fit a gamma distribution by looking at the
first two sample moments

Alkaline phosphatase measurements in 2019 mice a
4.03 b 0.14
8
Bias

Although the moment method looks sensible, it can
lead to biased estimators
In the previous example, estimates of both
parameters are upwardly biased
Bias is measured by the difference between the
expected estimate and the truth
However, bias is not the only thing to worry
about
For example, the value of the first observation
is an unbiased estimator of the mean for a Normal
distribution. However it is a rubbish estimator
We also need to worry about the variance of an
estimator

9
Example estimating the population mutation rate

In population genetics, a parameter of interest
is the population-scaled mutation rate
There are two common estimators for this
parameter
The average number of differences between two
sequences
The total number of polymorphic sites in the
sample divided by a constant that is
approximately the log of the sample size
Which is better?
The first estimator has larger variance than the
second suggesting that it is an inferior
estimator
It is actually worse than this it is not even
guaranteed to converge on the truth as the sample
size gets infinitely large
A property called consistency

10
The bias-variance trade off

Some estimators may be biased
Some estimators may have large variance
Which is better?
A simple way of combining both metrics is to
consider the mean-squared error of an estimator

11
Example

Consider two ways of estimating the variance of a
Normal distribution from the sample variance
The second estimator is unbiased, but the first
estimator has lower MSE
Actually, there is a third estimator, which is
even more biased than the first, but which has
even lower MSE

12
Least squares estimation

A commonly-used approach to fitting models to
data is called least squares estimation
This attempts to minimise the sum of the squares
of residuals
A residual is the difference between an observed
and a fitted value
An important point to remember is that minimising
LS is not the only thing to worry about when
fitting model
Over-fitting

13
Problems with moment estimation

It is not always possible to exactly match sample
moments with their expectation
It is not clear when using moment methods how
much of the information in the data about the
parameters is being used
Often not much..
Why should MSE be the best way of measuring the
value of an estimator?

14
Is there an optimal way to estimate parameters?

For any model the maximum information about model
parameters is obtained by considering the
likelihood function
The likelihood function is proportional to the
probability of observing the data given a
specified parameter value
One natural choice for point estimation of
parameters is the maximum likelihood estimate,
the parameter values that maximise the
probability of observing the data
The maximum likelihood estimate (mle) has some
useful properties (though is not always optimal
in every sense )

15
An intuitive view on likelihood
16
An example

Suppose we have data generated from a Poisson
distribution. We want to estimate the parameter
of the distribution
The probability of observing a particular random
variable is
If we have observed a series of iid Poisson RVs
we obtain the joint likelihood by multiplying the
individual probabilities together

17
Comments

Note in the likelihood function the factorials
have disappeared. This is because they provide a
constant that does not influence the relative
likelihood of different values of the parameter
It is usual to work with the log likelihood
rather than the likelihood. Note that maximising
the log likelihood is equivalent to maximising
the likelihood
We can find the mle of the parameter analytically

Take the natural log of the likelihood function
Find where the derivative of the log likelihood
is zero
Note that here the mle is the same as the moment
estimator
18
Sufficient statistics

In this example we could write the likelihood as
a function of a simple summary of the data the
mean
This is an example of a sufficient statistic.
These are statistics that contain all information
about the parameter(s) under the specified model
For example, support we have a series of iid
normal RVs

Mean square
Mean
19
Properties of the maximum likelihood estimate

The maximum likelihood estimate can be found
either analytically or by numerical maximisation
The mle is consistent in that it converges to the
truth as the sample size gets infinitely large
The mle is asymptotically efficient in that it
achieves the minimum possible variance (the
Cramér-Rao Lower Bound) as n?8
However, the mle is often biased for finite
sample sizes
For example, the mle for the variance parameter
in a normal distribution is the sample variance

20
Comparing parameter estimates

Obtaining a point estimate of a parameter is just
one problem in statistical inference
We might also like to ask how good different
parameter values are
One way of comparing parameters is through
relative likelihood
For example, suppose we observe counts of 12, 22,
14 and 8 from a Poisson process
The maximum likelihood estimate is 14. The
relative likelihood is given by

21
Using relative likelihood

The relative likelihood and log likelihood
surfaces are shown below

22
Interval estimation

In most cases the chance that the point estimate
you obtain for a parameter is actually the
correct one is zero
We can generalise the idea of point estimation to
interval estimation
Here, rather than estimating a single value of a
parameter we estimate a region of parameter space
We make the inference that the parameter of
interest lies within the defined region
The coverage of an interval estimator is the
fraction of times the parameter actually lies
within the interval
The idea of interval estimation is intimately
linked to the notion of confidence intervals

23
Example

Suppose Im interested in estimating the mean of
a normal distribution with known variance of 1
from a sample of 10 observations
I construct an interval estimator
The chart below shows how the coverage properties
of this estimator vary with a

If I choose a to be 0.62 I would have coverage of
95
24
Confidence intervals

It is a short step from here to the notion of
confidence intervals
We find an interval estimator of the parameter
that, for any value of the parameter that might
be possible, has the desired coverage properties
We then apply this interval estimator to our
observed data to get a confidence interval
We can guarantee that among repeat performances
of the same experiment the true value of the
parameter would be in this interval 95 of the
time
We cannot say There is a 95 chance of the true
parameter being in this interval

25
Example confidence intervals for normal
distribution

Creating confidence intervals for the mean of
normal distributions is relatively easy because
the coverage properties of interval estimators do
not depend on the mean (for a fixed variance)
For example, the interval estimator below has 95
coverage properties for any mean
As youll see later, there is an intimate link
between confidence intervals and hypothesis
testing

26
Example confidence intervals for exponential
distribution

For most distributions, the coverage properties
of an estimator will depend on the true
underlying parameter
However, we can make use of the CLT to make
confidence intervals for means
For example, for the exponential distribution
with different means, the graph shows the
coverage properties for the interval estimator
(n100)

27
Confidence intervals and likelihood

Thanks to the CLT there is another useful result
that allows us to define confidence intervals
from the log-likelihood surface
Specifically, the set of parameter values for
which the log-likelihood is not more than 1.92
less than the maximum likelihood will define a
95 confidence interval
In the limit of large sample size the LRT is
approximately chi-squared distributed under the
null
This is a very useful result, but shouldnt be
assumed to hold
i.e. Check with simulation

28
Bayesian estimators

As you may notice, the notion of a confidence
interval is very hard to grasp and has remarkably
little connection to the data that you have
collected
It seems much more natural to attempt to make
statements about which parameter values are
likely given the data you have collected
To put this on a rigorous probabilistic footing
we want to make statements about the probability
(density) of any particular parameter value given
our data
We use Bayes theorem

Prior
Likelihood
Posterior
Normalising constant
29
Bayes estimators

The single most important conceptual difference
between Bayesian statistics and frequentist
statistics is the notion that the parameters you
are interested in are themselves random variables
This notion is encapsulated in the use of a
subjective prior for your parameters
Remember that to construct a confidence interval
we have to define the set of possible parameter
values
A prior does the same thing, but also gives a
weight to different values

30
Example coin tossing

I toss a coin twice and observe two heads
I want to perform inference about the probability
of obtaining a head on a single throw for the
coin in question
The point estimate/MLE for the probability is 1.0
yet I have a very strong prior belief that the
answer is 0.5
Bayesian statistics forces the researcher to be
explicit about prior beliefs but, in return, can
be very specific about what information has been
gained by performing the experiment

31
The posterior

Bayesian inference about parameters is contained
in the posterior distribution
The posterior can be summarised in various ways

Posterior mean
Posterior
Prior
Credible Interval
32
Bayesian inference and the notion of shrinkage

The notion of shrinkage is that you can obtained
better estimates by assuming a certain degree of
similarity among the things you want to estimate
Practically, this means two things
Borrowing information across observations
Penalising inferences that are very different
from anything else
The notion of shrinkage is implicit in the use of
priors in Bayesian statistics
There are also forms of frequentist inference
where shrinkage is used
But NOT MLE

Write a Comment

User Comments (0)

About PowerShow.com

Estimating parameters from data - PowerPoint PPT Presentation

Estimating parameters from data

How can I estimate model parameters from data? ... However it is a rubbish estimator. We also need to worry about the variance of an estimator ... – PowerPoint PPT presentation