Bootstrap - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Bootstrap

Description:

Bootstrap Chingchun Huang ( ) Vision Lab, NCTU Introduction A data-based simulation method For statistical inference finding estimators of the parameter in ... – PowerPoint PPT presentation

Number of Views:370
Avg rating:3.0/5.0
Slides: 42
Provided by: edut1550
Category:

less

Transcript and Presenter's Notes

Title: Bootstrap


1
Bootstrap
  • Chingchun Huang (???)
  • Vision Lab, NCTU

2
Introduction
  • A data-based simulation method
  • For statistical inference
  • finding estimators of the parameter in interest
  • Confidence of the parameter in interest

3
An example
  • Two statistics definition for a random variable
  • Average sample mean
  • Standard error The standard deviation of the
    sample means
  • Calculation of two statistics
  • Carry out measurement many times
  • Observations from these two statistics
  • Standard error decreases as N increases
  • Sample mean becomes more reliable as N increases

4
Central limit theorem
  • Averages taken from any distribution
  • (your experimental data) will have a normal
  • distribution
  • The error for such an statistic will
  • decrease slowly as the number of
  • observations increase

5
Averages of N.D.
Normal distribution
c2 distribution
Averages of c2 distribution
6
Uniform distribution
Averages of U.D.
7
Consequences of central limit theorem
  • But nobody tells you how big the sample has to
    be..
  • Should we believe a measurement of Average?
  • How about other objects rather than Average

Bootstrap --- the technique to the rescue
8
Basic idea of bootstrap
  • Originally, from some list of data, one computes
    an object (e.g. statistic).
  • Create an artificial list by randomly drawing
    elements from that list. Some elements will be
    picked more than once.
  • Nonparametric mode (later)
  • Parametric mode (later)
  • Compute a new object.
  • Repeat 100-1000 times and look at the
    distribution of these objects.

9
A simple example
  • Data available comparing grades before and after
    leaving graduate school
  • Some linear correlation between grades r0.776
  • But how reliable is this result (r0.776)?

10

A simple example
11
(No Transcript)
12
(No Transcript)
13
A simple example
14
Confidence intervals
  • Consider the similar situation as before
  • The parameter of interest is ? (e.g. Mean)
  • is an estimator of ? based on the sample
    .
  • We are interested in finding the confidence
    interval for the parameter.

15
The percentile algorithm
  • Input the level2 for the confidence
    interval.
  • Generate B number of bootstrap samples.
  • Compute for b 1,, B
  • Arrange the new data set with s in order.
  • Compute and percentile for
    the new data.
  • C.I. is given by ( th , th
    )
  • Percentile 5 10 16 50 84 90 95
  • Percentile 49.7 56.4 62.7 86.9 112.3 118.7 126.7
  • of

16
How many bootstraps ?
  • No clear answer to this.
  • Rule of thumb try it 100 times, then 1000
    times, and see if your answers have changed by
    much.

17
How many bootstraps ?
B 50 100 200 500 1000 2000 3000
Std. Error 0.0869 0.0804 0.0790 0.0745 0.0759 0.0756 0.0755
18
Convergence
  • This histogram is showing the distribution of the
    correlation coefficient for the bootstrap sample
    . Here B200, B500

19
Contd
  • B1000, B2000

20
Contd..
  • B3000 B4000
  • Now it can be seen the sampling distributions
    of correlation coefficient are more or less
    identical.

21
Contd..
  • The above graph is showing the similarity in the
    distribution of the bootstrap distribution and
    the direct enumeration from random samples from
    the empirical distribution

22
Is it reliable ?
  • Observations
  • Good agreement for Normal (Gaussian)
    distributions
  • Skewed distributions tend to more problematic,
    particularly for the tails
  • A tip For now nobody is going to shoot you down
    for using it.

23
Schematic representation of bootstrap procedure
24
Bootstrap
  • The bootstrap can be used either
    non-parametrically or parametrically
  • In nonparametric mode, it avoids restrictive and
    sometimes dangerous parametric assumptions about
    the form of the underlying population .
  • In parametric mode it can provide more accurate
    estimates of errors than traditional methods.

25
Parametric Bootstrap
(distribution) P x (samples)
Real World
Statistic of interest
Bootstrap World Estimated
Bootstrap probability sample
model Bootstrap Replication
26
Bootstrap
  • The technique was extended, modified and
  • refined to handle a wide variety of problems
  • including
  • (1) confidence intervals and hypothesis tests,
  • (2) linear and nonlinear regression,
  • (3) time series analysis and other problems

26
27
Example one-dimensional smoothing
Fit a cubic spline (N50 training data)
28
The bootstrap and maximum likelihood method
Least squares
where ?
where
?
29
The bootstrap and maximum likelihood method
Nonparametric bootstrap Repeat B200
times - draw a dataset of N50 with replacement
from the training data zi(xi,yi) - fit a cubic
spline
Construct a 95 pointwise confidence
interval At each xi compute the mean and find
the 2,5 and 97,5 percentiles
30
The bootstrap and maximum likelihood method
Parametric bootstrap We assume that the
model errors are Gaussian Repeat B200 times -
draw a dataset of N50 with replacement from the
training data zi(xi,yi) - fit a cubic spline on
zi and estimate - simulate new
responses zi(xi,yi) - fit a
cubic spline on zi
Construct a 95 pointwise confidence interval At
each xi compute the mean and find the 2,5 and
97,5 percentiles
31
The bootstrap and maximum likelihood method
Parametric bootstrap
Conclusion least squares parametric
bootstrap as B ? ? (only because of Gaussian
errors)
32
Some notations
  • The Bootstrap is
  • A computer-based method for assigning measures of
    accuracy to statistical estimates.
  • The basic idea behind bootstrap is very simple,
    and goes back at least two centuries.
  • The bootstrap method is not a way of reducing
    the error ! It only tries to estimate it.
  • Bootstrap methods depend only on the Bootstrap
    samples. It does not depend on the underlying
    distribution.

33
A general data set-up
  • We have dealt with
  • The standard error
  • The confidence interval
  • With the assumption that distribution is either
    unknown or very complicated.
  • The situation can be more general
  • Like regression ,
  • Sometimes using maximum likelihood estimation.

34
Conclusion
  • The bootstrap allow the data analyst to
  • Asses the statistical accuracy of complicated
    procedures, by exploiting the power of the
    computer.
  • The use of the bootstrap either
  • Relief the analyst from having to do complex
    mathematical derivation or
  • Provide an answer where no analytical answer can
    be obtained.

35
Addendum The Jack-knife
  • Jack-knife is a special kind of bootstrap.
  • Each bootstrap subsample has all but one of the
    original elements of the list.
  • For example, if original list has 10 elements,
    then there are 10 jack-knife subsamples.

36
Introduction (continued)
  • Definition of Efrons nonparametric bootstrap.
  • Given a sample of n independent identically
  • distributed (i.i.d.) observations X1, X2, , Xn
    from
  • a distribution F and a parameter ? of the
  • distribution F with a real valued estimator
  • ?(X1, X2, , Xn ), the bootstrap estimates the
  • accuracy of the estimator by replacing F with Fn,
  • the empirical distribution, where Fn places
  • probability mass 1/n at each observation Xi.

36
37
Introduction (continued)
  • Let X1, X2, , Xn be a bootstrap sample, that
    is a sample of size n taken with replacement from
    Fn .
  • The bootstrap, estimates the variance of
  • ?(X1, X2, , Xn ) by computing or approximating
    the variance of
  • ? ?(X1, X2, , Xn ).

37
38
Introduction (continued)
  • The bootstrap is similar to earlier techniques
    which are also called resampling methods
  • (1) jackknife,
  • (2) cross-validation,
  • (3) delta method,
  • (4) permutation methods, and
  • (5) subsampling..

38
39
Bootstrap Remedies
  • In the past decade many of the problems where the
    bootstrap is inconsistent remedies have been
    found by researchers to give good modified
    bootstrap solutions that are consistent.
  • For both problems describe thus far a simple
    procedure called the m-out-n bootstrap has been
    shown to lead to consistent estimates .

39
40
The m-out-of-n Bootstrap
  • This idea was proposed by Bickel and Ren (1996)
    for handling doubly censored data.
  • Instead of sampling n times with replacement from
    a sample of size n they suggest to do it only m
    times where m is much less than n.
  • To get the consistency results both m and n need
    to get large but at different rates. We need
    mo(n). That is m/n?0 as m and n both ? 8.
  • This method leads to consistent bootstrap
    estimates in many cases where the ordinary
    bootstrap has problems, particularly (1) mean
    with infinite variance and (2) extreme value
    distributions.

Dont know why.
40
41
Examples where the bootstrap fails
  • Athreya (1987) shows that the bootstrap estimate
    of the sample mean is inconsistent when the
    population distribution has an infinite variance.
  • Angus (1993) provides similar inconsistency
    results for the maximum and minimum of a sequence
    of independent identically distributed
    observations.

41
Write a Comment
User Comments (0)
About PowerShow.com