Resampling techniques - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Resampling techniques

Description:

One of the purposes of statistics is to estimate some parameters and ... These include: Edgeworth series, Laplace approximation, saddle-point approximations. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 25
Provided by: gar115
Category:

less

Transcript and Presenter's Notes

Title: Resampling techniques


1
Resampling techniques
  • Why resampling?
  • Jacknife
  • Cross-validation
  • Bootstrap
  • Examples of application of bootstrap

2
Why resampling?
  • One of the purposes of statistics is to estimate
    some parameters and their reliability. Since
    estimators are functions of sample points they
    are random variables. If we could find
    distribution of this random variable (sample
    statistic) then we could estimate reliability of
    the estimators. Unfortunately apart from the
    simplest cases, sampling distribution is not easy
    to derive. There are several techniques to
    approximate them. These include Edgeworth
    series, Laplace approximation, saddle-point
    approximations. They give analytical forms for
    the approximate distributions. With advent of
    computers, computationally intensive methods are
    emerging. They work in many cases satisfactorily.
  • Examples of simplest cases where sample
    distributions are known include
  • Sample mean, when sample is from a population
    with normal distribution, has normal distribution
    with mean value equal to the population mean and
    variance equal to variance of the population
    divided by the sample size if population variance
    is known. If population variance is not known
    then variance of sample mean is approximated by
    the sample variance divided by n.
  • Sample variance has the distribution of multiple
    of ?2 distribution. Again it is valid if
    population distribution is normal and sample
    points are independent.
  • Sample mean divided by square root of sample
    variance has the multiple of the t distribution
    again normal and independence case
  • For two independent samples from normal
    distribution ration of sample variances has the
    multiple of F-distribution.

3
Simulation technique
  • One very simple yet powerful way of generating
    distribution for a statistic is simulation. It is
    done by considering assumptions about the
    population, generating random numbers under these
    assumptions and calculating statistic for the
    generated random samples. It is usually done
    1000-10000 times. Using the results we can build
    density of distribution, cumulative densities and
    design tests or any other inference we would like
    to make.
  • The procedure works as follow
  • Repeat N times
  • 1) generate sample of the required size under
    assumptions
  • 2) Calculate desired statistic and store it
  • After this procedure we will have as if we have
    repeated experiment/observations N times. This
    procedure will work if the assumptions are valid
    or distribution of statistics is less sensitive
    to departures from assumptions.

4
Simulation techniques
  • We can do this type of simulations using R. Let
    us write an example of a small generic function
    for simulation of desired statistics
  • fsimul function(n,fstat, ...)
  • resvector(lengthn)
  • for( i in 1n)
  • resifstat(...)
  • res
  • Only thing is remaining is to write a small
    function for statistic we want to generate
    distribution. Let us say that we want to simulate
    distribution of ratio of variances of two
    independent samples if both samples are from
    normal distributions
  • vrstat function(k1,k2,mn1,mn2,sdd1,sdd2)
  • var(rnorm(k1,meanmn1,sdsdd1))/var(rnorm(k2,me
    anmn2,sdsdd2))
  • Once we have both functions we can generate
    distribution of statistics for ratio of variances
    (e.g. for sample sizes of 10 and 15 from
    population with N(0,1))
  • vrdist fsimul(10000,vrstat,10,15,0.0,0.0,1.0,1.0
    )

5
Simulation techniques
  • Using 10000 values we can generate desired
    distribution. From theory we know that ratio of
    variances of independent random samples of sizes
    10 and 15 should be F distribution with degrees
    of freedom (9,14). Figure shows that theoretical
    and simulated distributions are very similar.
  • We can use this distribution for tests and other
    purposes. To find quantiles we can use the R
    function quantile
  • quantile(vrdist,c(0.025,0.975)
  • To generate cumulative distribution function we
    can use ecdf empirical cumulative distribution
    function
  • ecv ecdf(vrdist) Generate cumulative
    density
  • plot(ecdf) Plot ecdf
  • ecv(value) calculate probability P(xltvalue)

Bars simulated distribution Red line
theoretical distribution
6
Simulation techniques
  • Two small exercises
  • Generate simulated distribution (density and
    cumulative) for a) sample maximum, b) sample
    minimum c) sample range (the distributions of
    the sample maximum and the sample minimum are
    known as extreme value distributions)
  • Generate simulated density for the sample range
    divided by the sample standard deviation.

7
Resampling techniques
  • Resampling techniques use data to calculate bias,
    prediction error and distribution functions.
  • Three of the popular computer intensive
    resampling techniques are
  • Jacknife. It is a useful tool for bias removal.
    It may work fine for medium and large samples.
  • Cross-validation. Very useful technique for model
    selection. It may help to choose best model
    among those under consideration.
  • Bootstrap. Perhaps one of the most important
    resampling techniques. It can reduce bias as well
    as can give variance of an estimator. Moreover it
    can give the distribution of the statistic under
    consideration. This distribution can be used for
    such wide variety purposes as interval
    estimation, hypothesis testing.

8
Jacknife
  • Jacknife is used for bias removal. As we know,
    mean-square error of an estimator is equal to the
    square of the bias plus the variance of the
    estimator. If the bias is much higher than
    variance then under some circumstances Jacknife
    could be used.
  • Description of Jacknife Let us assume that we
    have a sample of size n. We estimate some sample
    statistics using all the data tn. Then by
    removing one point at a time we estimate tn-1,i,
    where subscript indicates the size of the sample
    and the index of the removed sample point. Then
    new estimator is derived as
  • If the order of the bias of the statistic tn is
    O(n-1) then after the jacknife the order of the
    bias becomes O(n-2).
  • Variance is estimated using
  • This procedure can be applied iteratively. I.e.
    for the new estimator jacknife can be applied
    again. First application of Jacknife can reduce
    bias without changing variance of the estimator.
    But its second and higher order application can
    in general increases the variance of the
    estimator.

9
Jacknife An example
  • Let us take a data set of size 12 and perform
    jacknife for mean value.

data
mean 0) 368 390 379 260 404 318 352 359 216 222
283 332 323.5833 Jacknife
samples 1) 390 379 260 404 318 352 359
216 222 283 332 319.5455 2) 368
379 260 404 318 352 359 216 222 283 332
317.5455 3) 368 390 260 404 318
352 359 216 222 283 332 318.5455 4)
368 390 379 404 318 352 359 216 222 283
332 329.3636 5) 368 390 379 260
318 352 359 216 222 283 332
316.2727 6) 368 390 379 260 404 352 359
216 222 283 332 324.0909 7) 368
390 379 260 404 318 359 216 222 283 332
321.0000 8) 368 390 379 260 404 318
352 216 222 283 332 320.3636
9) 368 390 379 260 404 318 352 359 222
283 332 333.3636 10) 368 390 379
260 404 318 352 359 216 283 332
332.8182 11) 368 390 379 260 404 318 352 359
216 222 332 327.2727 12) 368
390 379 260 404 318 352 359 216 222 283
322.8182 tjack 12323.5833-11mean(t)
323.5833. It is equal to the sample. It is not
surprising since mean is an unbiased estimator
10
Cross-validation
  • Cross-validation is a resampling technique to
    overcome overfitting.
  • Let us consider a least-squares technique. Let us
    assume that we have a sample of size n
    y(y1,y2,,,yn). We want to estimate the
    parameters ?(?1, ?2,,, ?m). Now let us further
    assume that mean value of the observations is a
    function of these parameters (we may not know
    form of this function). Then we can postulate
    that function has a form g. Then we can find
    values of the parameters using least-squares
    techniques.
  • Where X is a fixed (design) matrix. After
    minimisation of h we will have values of the
    parameters, therefore complete definition of the
    function. Form of the function g defines model we
    want to use. We may have several forms of the
    function. Obviously if we have more parameters,
    the fit will be better. Question is what would
    happen if we would have new observations. Using
    estimated values of the parameters we could
    estimate the square of differences. Let us say we
    have new observations (yn1,,,ynl). Can our
    function predict these new observations? Which
    function predicts future observations better? To
    answer to these questions we can calculate new
    differences
  • Where PE is the prediction error. Function g that
    gives smallest value for PE have higher
    predictive power. Model that gives smaller h but
    larger PE corresponds to overfitted model.

11
Cross-validation Cont.
  • If we have a sample of observations, can we use
    this sample and choose among given models. Cross
    validation attempts to reduce overfitting thus
    helps model selection.
  • Description of cross-validation We have a sample
    of the size n (yi,xi) .
  • Divide the sample into K roughly equal size
    parts.
  • For the k-th part, estimate parameters using K-1
    parts excluding k-th part. Calculate prediction
    error for k-th part.
  • Repeat it for all k1,2,,,K and combine all
    prediction errors and get cross-validation
    prediction error.
  • If Kn then we will have leave-one-out
    cross-validation technique.
  • Let us denote an estimate at the k-th step by
    ?k (it is a vector of parameters). Let k-th
    subset of the sample be Ak and number of points
    in this subset is Nk.. Then prediction error per
    observation is
  • Then we would choose the function that gives the
    smallest prediction error. We can expect that in
    future when we will have new observations this
    function will give smallest prediction error.
  • This technique is widely used in modern
    statistical analysis. It is not restricted to
    least-squares technique. Instead of least-squares
    we could could use other techniques such as
    maximum-likelihood, Bayesian estimation,
    M-estimation.

12
Bootstrap
  • Bootstrap is one of the computationally expensive
    techniques. Its simplicity and increasing
    computational power makes this technique as a
    method of choice in many applications. In a very
    simple form it works as follows.
  • We have a sample of size n. We want to estimate
    some parameter ?. The estimator for this
    parameter gives t. We want the distribution of t.
    For each sample point we assign probability
    (usually equal to 1/n, i.e. all sample points
    have equal probability). Then from this sample
    with replacement we draw another random sample of
    size n and estimate ?. This procedure is repeated
    B times. Let us denote an estimate of the
    parameter by tj at the j-th resampling stage.
    Bootstrap estimator for ? and its variance is
    calculated as
  • It is a very simple form of application of the
    bootstrap resampling. For the parameter
    estimation, the number of the bootstrap samples
    is usually chosen to be around 200. When the
    distribution is desired then the recommended
    number is around 1000-2000
  • Let us analyse the working of bootstrap in one
    simple case. Consider a random variable X with
    sample (outcome) space x(x1,,,,xM). Each point
    have the probability fj. I.e.
  • f (f1,,,fM) represents the distribution of the
    population. The sample of size n will have
    relative frequencies for each sample point as

13
Bootstrap Cont.
  • Then the distribution of conditional on f
    will be multinomial distribution
  • Multinomial distribution is the extension of the
    binomial distribution and expressed as
  • Limiting distribution of
  • is multinormal distribution. Now if we resample
    from the sample then we should consider
    conditional distribution of the following (that
    is also multinomial distribution)
  • Limiting distribution of
  • is the same as the conditional distribution of
    the original sample. Since these two distribution
    converge to the same distribution then well
    behaved function of sample also will have the
    same limiting distributions. Thus if we use
    bootstrap to derive distribution of the sample
    statistic we can expect that in the limit it will
    converge to the distribution of sample statistic.
    I.e. following two function will have the same
    limiting distributions

14
Bootstrap Cont.
  • If we could enumerate all possible resamples from
    our sample then we could build ideal bootstrap
    distribution (the number of samples is nn). In
    practice even with modern computers it is
    impossible to achieve. Usually from few hundred
    to few thousand bootstrap samples are used.
  • Usually bootstrap works like
  • Draw a random sample of size of n with
    replacement from the given sample of size n.
  • Estimate parameter and get the estimate tj.
  • Repeat step 1) and 2) B times and build
    frequency and cumulative distributions for t

15
Bootstrap Cont.
  • While resampling we did not use any assumption
    about the population distribution. So, this
    bootstrap is a non-parametric bootstrap. If we
    have some idea about the population distribution
    then we can use it in resampling. I.e. when we
    draw randomly from our sample we can use
    population distribution. For example if we know
    that population distribution is normal then we
    can estimate its parameters using our sample
    (sample mean and variance). Then we can
    approximate population distribution with this
    sample distribution and use it to draw new
    samples. As it can be expected if assumption
    about population distribution is correct then
    parametric bootstrap will perform better. If it
    is not correct then non-parametric bootstrap will
    outperform its parametric counterpart.

16
Balanced bootstrap
  • One of the variation of bootstrap resampling is
    balanced bootstrap. In this case, when
    resampling, one makes sure that the number of
    occurrences of each sample point is the same.
    I.e. if we make B bootstrap we try to make the
    total number of occurrences of xi equal to B in
    all bootstrap samples. Of course, in each sample
    some of the observation will be present several
    times and other will be missing. It can be
    achieved as follows
  • Let us assume that the number of sample points is
    n.
  • Repeat numbers from 1 to n, B times
  • Find a random permutation of numbers from 1 to
    nB. Call it a vector N(nB)
  • Take the first n points from 1 to n and the
    corresponding sample points. Estimate parameter
    of interest. Then take the second n points (from
    n1 to 2n) and corresponding sample points and do
    estimation. Repeat it B times and find bootstrap
    estimators, distributions.

17
Balanced bootstrap Example.
  • Let us assume that we have 3 sample points and
    number of bootstraps we want is 3. Our
    observations are (x1,x2,x3)
  • Then we repeat numbers from 1 to 3 three times
  • 1 2 3 1 2 3 1 2 3
  • Then we take one of the random permutations of
    numbers from 1 to 3x39. E.g.
  • 4 3 9 5 6 1 2 8 7
  • First we take observations x1,x3,x3 estimate the
    parameter
  • Then we take x2,x3,x1 and estimate the parameter
  • Then we take x2,x2,x1 and we estimate parameter.
  • As it can be seen each observation is present 3
    times.
  • This technique meant to improve the results of
    bootstrap resampling.

18
Bootstrap in R
  • We can either write our own bootstrap resampling
    functions or use what is available in R. There is
    a generic function in R from the package boot
    that can do bootstrap sampling. Perhaps its worth
    spending some time and study working of this
    unction.
  • To use boot function for a given statistic (let
    us take an example of mean) we need to write a
    function that calculates it for a given sample
    points. For example
  • mnboot function(d,nn)mean(dnn) where nn
    is an integer vector of length that is equal to
    the length of d
  • Now we can use boot function from R (make sure
    that boot package has been loaded)
  • require(boot)
  • mnb boot(del,mnboot,10000) Calculate
    bootstrap estimation for del 10000 times.

19
Bootstrap Example.
  • Let us take the example we used for Jackknife. We
    generate 10000 (simple) bootstrap samples and
    estimate for each of them the mean value. Here is
    the bootstrap distribution of the estimated
    parameter. This distribution now can be used for
    various purposes (for variance estimation, for
    interval estimation, hypothesis testing and so
    on). For comparison the normal distribution with
    mean equal to the sample mean and variance equal
    to the sample variance divided by number of
    elements is also given (black line) .

It seems that the approximation with the normal
distribution was very good.
20
Bootstrap Example.
  • Once we have bootstrap estimates we can use them
    for bias removal, variance estimation, interval
    estimation etc. Sequence of commands in R would
    be as follows
  • read or prepare data and write a function for
    the statistic you want to estimate
  • d1 c(368, 390, 379, 260, 404, 318,352, 359,
    216, 222, 283, 332)
  • mnboot function(d,nn)mean(dnn) This
    function defines the statistic you want to
    calculate
  • require(boot)
  • nb 10000
  • mnb boot(d1,mnboot,nb)
  • calculate mean value and variance
  • mean(mnbt)
  • var(mnbt)
  • hist(mnbt)
  • calculate 95 confidence intervals
  • quantile(mnbt,c(0.025,0.975))
  • estimate empirical cumulative density
    functions
  • ecv ecdf(mnbt)
  • plot(ecv)

21
Bootstrap intervals
  • Results of boot command can be used to estimate
    confidence intervals (i.e. interval where
    statistic would fall if we would repeat
    experiments many times). They can be calculated
    using boot.ci (from boot package). It calculates
    simple percentile intervals, normal approximation
    intervals, intervals corrected to skewness and
    median biasedness.
  • If bootstrap variances are defined then
    t-distribution approximated intervals also are
    given.

22
Bootstrap Warnings.
  • Bootstrap technique can be used for well behaved
    statistic (functions of observations). For
    example bootstrap does not seem to be good for
    extreme value estimations. Simulation may be used
    to design the distribution functions.
  • Bootstrap can be sensitive to extreme outliers.
    It may be a good idea to deal with outliers
    before applying bootstrap (or calculating any
    statistic) or generate more more bootstrap
    samples (say instead of B we can generate
    (1a)B) and then deal with outliers after
    bootsrap estimations.
  • For complicated statistics and large number of
    observations bootstrap may be very time
    consuming. Normal approximations to statistic may
    give reasonable results.

23
References
  • Efron, B (1979) Bootstrap methods another look
    at the jacknife. Ann Statist. 7, 1-26
  • Efron, B Tibshirani, RJ (1993) An Introduction
    to the Bootstrap
  • Chernick, MR. (1999) Bootstrap Methods A
    practitioners Guide.
  • Berthold, M and Hand, DJ (2003) Intelligent Data
    Analysis
  • Kendalls advanced statistics, Vol 1 and 2

24
Exercise 2
  • Differences between means and bootstrap
    confidence intervals
  • Two species (A and B) of trees were planted
    randomly. Each specie had 10 plots. Average
    height for each plot was measured after 6 years.
    Analyze differences in means.
  • A 3.2 2.7 3.0 2.7 1.7 3.3 2.7 2.6 2.9
    3.3
  • B 2.8 2.7 2.0 3.0 2.1 4.0 1.5 2.2 2.7
    2.5
  • Test hypothesis. H0 means are equal, H1 means
    are not equal
  • Use var.test for equality of variances, t.test
    for equality of means.
  • Use bootstrap distributions and define confidence
    intervals.
  • Calculate power of the test
  • Write a report.
Write a Comment
User Comments (0)
About PowerShow.com