Resampling techniques - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Resampling techniques

Description:

Sample variance has the distribution of multiple of 2 distribution. ... First application of Jacknife can reduce bias without changing variance of the estimator. ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 12
Provided by: gar115
Category:

less

Transcript and Presenter's Notes

Title: Resampling techniques


1
Resampling techniques
  • Why resampling?
  • Jacknife
  • Cross-validation
  • Bootstrap
  • Examples of application of bootstrap

2
Why resampling?
  • Purpose of statistics is to estimate some
    parameter(s) and reliability of them. Since
    estimators are function of the sample points they
    are random variables. If we could find
    distribution of this random variable (sample
    statistic) then we could estimate reliability of
    the estimators. Unfortunately apart from the
    simplest cases, sampling distribution is not easy
    to derive. There are several techniques to
    approximate these distributions. These include
    Edgeworth series, Laplace approximation,
    saddle-point approximations and others. These
    approximations give analytical form for the
    approximate distributions. With advent of
    computers more computationally intensive methods
    are emerging. They work in many cases
    satisfactorily.
  • If we would have sampling distribution for the
    sampling statistics then we can estimate variance
    of the estimator, interval, even test hypotheses.
    Examples of simplest cases where sample
    distribution is known include
  • Sample mean when sample is from the normal
    distribution normal distribution with mean
    value equal to sample mean and variance equal to
    variance of the population divided by sample size
    if population variance is known. If population
    variance is not known then variance of sample
    mean is sample variance divided by n.
  • Sample variance has the distribution of multiple
    of ?2 distribution. Again it is valid if
    population distribution is normal.
  • Sample mean divided by square root of sample
    variance has the multiple of the t distribution
    again normal case
  • For independent samples sample variance divided
    by sample variance has the multiple of
    F-distribution.

3
Jacknife
  • Jacknife is used for bias removal. As we know
    that mean-square error for an estimator is equal
    to square of bias plus variance of the estimator.
    If bias is much higher than variance then under
    some circumstances Jacknife could be used.
  • Description of Jacknife Let us assume that we
    have a sample of size n. We estimate some sample
    statistics using all data tn. Then by removing
    one point at a time we estimate tn-1,i, where
    subscript indicates size of the sample and index
    of removed sample point. Then new estimator is
    derived as
  • If the order of the bias of the statistic tn is
    O(n-1) then after jacknife order of the bias
    becomes O(n-2).
  • Variance is estimated using
  • This procedure can be applied iteratively. I.e.
    for new estimator jacknife can be applied again.
    First application of Jacknife can reduce bias
    without changing variance of the estimator. But
    its second and higher order application can in
    general increase the variance of the estimator.

4
Cross-validation
  • Cross-validation is a resampling technique to
    overcome overfitting.
  • Let us consider least-squares technique. Let us
    assume that we have sample of size n
    y(y1,y2,,,yn). We want to estimate parameters
    ?(?1, ?2,,, ?m). Now let us further assume that
    mean value of the observations is a function of
    these parameters (we may not know form of this
    function). Then we can postulate that function
    has a form g. Then we can find values of the
    parameters using least-squares techniques.
  • Where X is a fixed matrix or random variables.
    After this technique we will have values of the
    parameters therefore form of the function. Form
    of the function g defines model we want to use.
    We may have several forms of the function.
    Obviously if we have more parameters fit will be
    better. Question is what would happen if we
    would observe new values of observations. Using
    estimated values of the parameters we could
    estimate square differences. Let us say we have
    new observation (yn1,,,ynl). Can our function
    predict new observations? Which function can
    predict better? To answer to these questions we
    can calculate new differences
  • Where PE is prediction error. Function g that
    gives smallest value for PE will have higher
    predictive power. Function that gives smaller h
    but larger PE will be called overfitted function.

5
Cross-validation Cont.
  • When we choose the function using current sample
    how can we avoid overfitting? Cross-validation is
    an approach to deal with this problem.
  • Description of cross-validation We have a sample
    of size n.
  • Divide sample into K roughly equal size parts.
  • For the kth part, estimate parameters using K-1
    parts excluding kth part. Calculate prediction
    error for kth part.
  • Repeat it for all k1,2,,,K and combine all
    prediction errors and get cross-validation
    prediction error.
  • If Kn then we will have leave-one-out
    cross-validation technique. Let us denote
    estimate at the kth step by ?k (we will use a
    vector form). Let kth subset of the sample be Ak
    and number of points in this subset is Nk.. Then
    prediction error calculated per observation would
    be
  • Then we would choose the function that gives the
    smallest prediction error. We can expect that in
    future when we will have new observation this
    function will give smallest prediction error.
  • This technique is widely used in modern
    statistical analysis. It is not restricted to
    least-squares technique. Instead of least-squares
    we could have any other form dependent on the
    distribution of the observations. It can in
    principle be applied to various
    maximum-likelihood and other estimators.
  • Cross-validation is useful for model selection.
    I.e. if we have several models using
    cross-validation we select one of them.

6
Bootstrap
  • Bootstrap is one of the computationally very
    expensive techniques. In a very simple form it
    works as follows.
  • We have a sample of size n. We want to estimate
    some parameter ?. Estimator for this parameter
    gives t. For each sample point we assign
    probability (usually 1/n, i.e. all sample points
    have equal probability). Then from this sample
    with replacement we draw another random sample of
    size n and estimate ?. Let us denote estimate of
    the parameter by ti at the jth resampling stage.
    Bootstrap estimator for ? and its variance is
    calculated as
  • It is very simple form of application of the
    bootstrap resampling. For the parameter
    estimation bootstrap is usually chosen to be
    around 200.
  • Let us analyse the working of bootstrap in one
    simple case. Consider random variable X with
    sample space x(x1,,,,xM). Each point have
    probability fj. I.e.
  • f (f1,,,fM) represents distribution of the
    population. The sample of the size n will have
    relative frequencies for each sample point as

7
Bootstrap Cont.
  • Then distribution of conditional on f will
    be multinomial distribution
  • Multinomial distribution is the extension of the
    binomial distribution and expressed as
  • Limiting distribution of
  • Is multinormal distribution. If we resample from
    the given sample then we should consider
    conditional distribution of the following (that
    is also multinomial distribution)
  • Limiting distribution of
  • is the same as the conditional distribution of
    original sample. Since these two distribution
    converge to the same distribution then well
    behaved function of them also will have same
    limiting distributions. Thus if we use bootstrap
    to derive distribution of the sample statistic we
    can expect that in the limit it will converge to
    the distribution of sample statistic. I.e.
    following two function will have the same
    limiting distributions

8
Bootstrap Cont.
  • If we could enumerate all possible resamples from
    our sample then we could build ideal bootstrap
    distribution. In practice even with modern
    computers it is impossible to achieve. Instead
    Monte Carlo simulation is used. Usually it works
    like
  • Draw random sample of size of n with replacement
    from the given sample.
  • Estimate parameter and get estimate tj.
  • Repeat it B times and build frequency and
    cumulative distributions for t

9
Bootstrap Cont.
  • How to build the cumulative distribution (it
    approximates our distribution function)? Consider
    sample of size n. x(x1,x2,,,,xn). Then
    cumulative distribution will be
  • where I denotes the indicator function
  • Another way of building the cumulative
    distribution is to sort the data first so that
  • Then build cumulative distribution like
  • We can also build histogram that approximates
    density of the distribution. First we should find
    interval that contains our data into equal
    intervals with length ?t. Assume that center of
    the i-th interval is ti.. Then histogram can be
    calculated using the formula
  • Once we have the distribution of the statistics
    we can use it for various purposes. Bootstrap
    estimation of the parameter and its variance is
    one of the possible application. We can use this
    distribution for hypothesis testing, interval
    estimation etc. For pure parameter estimation we
    need resample around 200 times. For interval
    estimation we might need to resample around 2000
    times. Reason is that for interval estimation and
    hypothesis testing we need more accurate
    distribution.

10
Bootstrap Cont.
  • Since while resampling we did not use any
    assumption about the population distribution this
    bootstrap is called non-parametric bootstrap. If
    we have some idea about the population
    distribution then we can use it in resampling.
    I.e. when we draw randomly from our sample we can
    use population distribution. For example if we
    know that population distribution is normal then
    we can estimate its parameters using our sample
    (sample mean and variance). Then we can
    approximate population distribution with this
    sample distribution and use it to draw new
    samples. As it can be expected if assumption
    about population distribution is correct then
    parametric bootstrap will perform better. If it
    is not correct then non-parametric bootstrap will
    overperform its parametric counterpart.

11
Bootstrap Some simple applications
  • Linear model 1) Estimate parameters 2)
    Calculate fitted values using
  • 3) Calculate residuals using
  • 4) Draw n random representatives from r (call
    rrandom) and add this to the fitted values and
    calculate new observations
  • 5) Estimate new parameters and save them
  • 6) Go to step 4.
  • Generalised linear models. Procedure is as in
    linear model case with small modifications. 1)
    Residuals can be calculated using
  • 2) When calculating new observation make sure
    that they are similar to the original
    observations. E.g. in binomial case make sure
    that values are 0 or 1
Write a Comment
User Comments (0)
About PowerShow.com