Resampling techniques - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Resampling techniques

Description:

Sample variance has the distribution of multiple of 2 distribution. ... First application of Jacknife can reduce bias without changing variance of the estimator. ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 12

Provided by: gar115

Category:

more less

Transcript and Presenter's Notes

Title: Resampling techniques

1
Resampling techniques

Why resampling?
Jacknife
Cross-validation
Bootstrap
Examples of application of bootstrap

2
Why resampling?

Purpose of statistics is to estimate some
parameter(s) and reliability of them. Since
estimators are function of the sample points they
are random variables. If we could find
distribution of this random variable (sample
statistic) then we could estimate reliability of
the estimators. Unfortunately apart from the
simplest cases, sampling distribution is not easy
to derive. There are several techniques to
approximate these distributions. These include
Edgeworth series, Laplace approximation,
saddle-point approximations and others. These
approximations give analytical form for the
approximate distributions. With advent of
computers more computationally intensive methods
are emerging. They work in many cases
satisfactorily.
If we would have sampling distribution for the
sampling statistics then we can estimate variance
of the estimator, interval, even test hypotheses.
Examples of simplest cases where sample
distribution is known include
Sample mean when sample is from the normal
distribution normal distribution with mean
value equal to sample mean and variance equal to
variance of the population divided by sample size
if population variance is known. If population
variance is not known then variance of sample
mean is sample variance divided by n.
Sample variance has the distribution of multiple
of ?2 distribution. Again it is valid if
population distribution is normal.
Sample mean divided by square root of sample
variance has the multiple of the t distribution
again normal case
For independent samples sample variance divided
by sample variance has the multiple of
F-distribution.

3
Jacknife

Jacknife is used for bias removal. As we know
that mean-square error for an estimator is equal
to square of bias plus variance of the estimator.
If bias is much higher than variance then under
some circumstances Jacknife could be used.
Description of Jacknife Let us assume that we
have a sample of size n. We estimate some sample
statistics using all data tn. Then by removing
one point at a time we estimate tn-1,i, where
subscript indicates size of the sample and index
of removed sample point. Then new estimator is
derived as
If the order of the bias of the statistic tn is
O(n-1) then after jacknife order of the bias
becomes O(n-2).
Variance is estimated using
This procedure can be applied iteratively. I.e.
for new estimator jacknife can be applied again.
First application of Jacknife can reduce bias
without changing variance of the estimator. But
its second and higher order application can in
general increase the variance of the estimator.

4
Cross-validation

Cross-validation is a resampling technique to
overcome overfitting.
Let us consider least-squares technique. Let us
assume that we have sample of size n
y(y1,y2,,,yn). We want to estimate parameters
?(?1, ?2,,, ?m). Now let us further assume that
mean value of the observations is a function of
these parameters (we may not know form of this
function). Then we can postulate that function
has a form g. Then we can find values of the
parameters using least-squares techniques.
Where X is a fixed matrix or random variables.
After this technique we will have values of the
parameters therefore form of the function. Form
of the function g defines model we want to use.
We may have several forms of the function.
Obviously if we have more parameters fit will be
better. Question is what would happen if we
would observe new values of observations. Using
estimated values of the parameters we could
estimate square differences. Let us say we have
new observation (yn1,,,ynl). Can our function
predict new observations? Which function can
predict better? To answer to these questions we
can calculate new differences
Where PE is prediction error. Function g that
gives smallest value for PE will have higher
predictive power. Function that gives smaller h
but larger PE will be called overfitted function.

5
Cross-validation Cont.

When we choose the function using current sample
how can we avoid overfitting? Cross-validation is
an approach to deal with this problem.
Description of cross-validation We have a sample
of size n.
Divide sample into K roughly equal size parts.
For the kth part, estimate parameters using K-1
parts excluding kth part. Calculate prediction
error for kth part.
Repeat it for all k1,2,,,K and combine all
prediction errors and get cross-validation
prediction error.
If Kn then we will have leave-one-out
cross-validation technique. Let us denote
estimate at the kth step by ?k (we will use a
vector form). Let kth subset of the sample be Ak
and number of points in this subset is Nk.. Then
prediction error calculated per observation would
be
Then we would choose the function that gives the
smallest prediction error. We can expect that in
future when we will have new observation this
function will give smallest prediction error.
This technique is widely used in modern
statistical analysis. It is not restricted to
least-squares technique. Instead of least-squares
we could have any other form dependent on the
distribution of the observations. It can in
principle be applied to various
maximum-likelihood and other estimators.
Cross-validation is useful for model selection.
I.e. if we have several models using
cross-validation we select one of them.

6
Bootstrap

Bootstrap is one of the computationally very
expensive techniques. In a very simple form it
works as follows.
We have a sample of size n. We want to estimate
some parameter ?. Estimator for this parameter
gives t. For each sample point we assign
probability (usually 1/n, i.e. all sample points
have equal probability). Then from this sample
with replacement we draw another random sample of
size n and estimate ?. Let us denote estimate of
the parameter by ti at the jth resampling stage.
Bootstrap estimator for ? and its variance is
calculated as
It is very simple form of application of the
bootstrap resampling. For the parameter
estimation bootstrap is usually chosen to be
around 200.
Let us analyse the working of bootstrap in one
simple case. Consider random variable X with
sample space x(x1,,,,xM). Each point have
probability fj. I.e.
f (f1,,,fM) represents distribution of the
population. The sample of the size n will have
relative frequencies for each sample point as

7
Bootstrap Cont.

Then distribution of conditional on f will
be multinomial distribution
Multinomial distribution is the extension of the
binomial distribution and expressed as
Limiting distribution of
Is multinormal distribution. If we resample from
the given sample then we should consider
conditional distribution of the following (that
is also multinomial distribution)
Limiting distribution of
is the same as the conditional distribution of
original sample. Since these two distribution
converge to the same distribution then well
behaved function of them also will have same
limiting distributions. Thus if we use bootstrap
to derive distribution of the sample statistic we
can expect that in the limit it will converge to
the distribution of sample statistic. I.e.
following two function will have the same
limiting distributions

8
Bootstrap Cont.

If we could enumerate all possible resamples from
our sample then we could build ideal bootstrap
distribution. In practice even with modern
computers it is impossible to achieve. Instead
Monte Carlo simulation is used. Usually it works
like
Draw random sample of size of n with replacement
from the given sample.
Estimate parameter and get estimate tj.
Repeat it B times and build frequency and
cumulative distributions for t

9
Bootstrap Cont.

How to build the cumulative distribution (it
approximates our distribution function)? Consider
sample of size n. x(x1,x2,,,,xn). Then
cumulative distribution will be
where I denotes the indicator function
Another way of building the cumulative
distribution is to sort the data first so that
Then build cumulative distribution like
We can also build histogram that approximates
density of the distribution. First we should find
interval that contains our data into equal
intervals with length ?t. Assume that center of
the i-th interval is ti.. Then histogram can be
calculated using the formula
Once we have the distribution of the statistics
we can use it for various purposes. Bootstrap
estimation of the parameter and its variance is
one of the possible application. We can use this
distribution for hypothesis testing, interval
estimation etc. For pure parameter estimation we
need resample around 200 times. For interval
estimation we might need to resample around 2000
times. Reason is that for interval estimation and
hypothesis testing we need more accurate
distribution.

10
Bootstrap Cont.

Since while resampling we did not use any
assumption about the population distribution this
bootstrap is called non-parametric bootstrap. If
we have some idea about the population
distribution then we can use it in resampling.
I.e. when we draw randomly from our sample we can
use population distribution. For example if we
know that population distribution is normal then
we can estimate its parameters using our sample
(sample mean and variance). Then we can
approximate population distribution with this
sample distribution and use it to draw new
samples. As it can be expected if assumption
about population distribution is correct then
parametric bootstrap will perform better. If it
is not correct then non-parametric bootstrap will
overperform its parametric counterpart.

11
Bootstrap Some simple applications

Linear model 1) Estimate parameters 2)
Calculate fitted values using
3) Calculate residuals using
4) Draw n random representatives from r (call
rrandom) and add this to the fitted values and
calculate new observations
5) Estimate new parameters and save them
6) Go to step 4.
Generalised linear models. Procedure is as in
linear model case with small modifications. 1)
Residuals can be calculated using
2) When calculating new observation make sure
that they are similar to the original
observations. E.g. in binomial case make sure
that values are 0 or 1