STAT 572: Bootstrap Project - PowerPoint PPT Presentation

About This Presentation
Title:

STAT 572: Bootstrap Project

Description:

Title: STAT 572: Bootstrap Project Author: Sallie Last modified by: Erik Erhardt Created Date: 4/21/2006 7:33:47 PM Document presentation format – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 30
Provided by: Sal8153
Category:

less

Transcript and Presenter's Notes

Title: STAT 572: Bootstrap Project


1
STAT 572 Bootstrap Project
  • Group Members
  • Cindy Bothwell
  • Erik Barry Erhardt
  • Nina Greenberg
  • Casey Richardson
  • Zachary Taylor

2
Histograms of Complex Population Distribution
3
Histograms of Population Sampling Distribution of
the Median and Estimated Bootstrap Sampling
Distributions
4
What is a Bootstrap
  • A method of Resampling creating many samples
    from a single sample
  • Generally, resampling is done with replacement
  • Used to develop a sampling distribution of
    statistics such as mean, median, proportion,
    others.

5
The Bootstrap and Complex Surveys
  • Number of bootstrap samples
  • n sample size, N population size
  • Possible resamples nn (example n200,
    2002001.6x10460)
  • Too many possibilities N!/n!(N-n)!, limit to B
    a large number, (example 1000) - the Monte
    Carlo approximation
  • Determine sampling distribution with parameters
  • Calculate variance in the normal way

6
Advantages and Disadvantages
  • Advantages
  • Avoids the costs of taking new samples (Estimate
    a sampling distribution when only one sample is
    available)
  • Checking parametric assumptions
  • Used when parametric assumptions cannot be made
    or are very complicated
  • Estimation of variance in quantiles
  • Disadvantages
  • Relies on a representative sample
  • Variability due to finite replications (Monte
    Carlo)

7
Computations
  • With more computing power available, bootstrap is
    possible for a large number of resamples
  • Possible programs
  • Matlab
  • Minitab
  • SAS
  • Excel
  • S-Plus
  • SPSS
  • Fathom

8
Bootstrap using SURVEY program
  • Main parameter of interest is the median price
    that all households in Lockhart City are wiling
    to pay for cable.
  • The price that a household is willing to pay for
    cable is positively correlated with
    average-district house value.
  • Districts in Lockhart City are divided into
    strata based on average house value.
  • Estimate the variance and create 95 CI

9
Lockhart City Strata Characteristics
  • Take a stratified random sample of size 200 using
    proportional allocation.
  • Using the stratified random sample, implement the
    general bootstrap procedure, BWO, and
    mirror-match.

10
Variations of the Bootstrap in Strata
  • General Bootstrap
  • Mimic the original sampling method
  • BWO Bootstrap Without Replacement
  • Grow the sample to the size of the population
  • Mirror-Match
  • Repeated miniature resamples

11
BWO Bootstrap Without Replacement
  • Grow the sample to the size of the population
  • For each stratum L, create a pseudo-population by
    replicating the sample kL times.
  • Resample nL units from each stratum without
    replacement to obtain a single bootstrap sample
    for stratum L.
  • Repeat a large number of times

12
BWO Variable Definitions
13
Disadvantages of extended BWO
  • NL must be known
  • nL and kL are often non-integers
  • Must bracket between integers if nL and kL are
    non-integer
  • Computing time

14
Mirror-Match
  • Repeated miniature resamples
  • Resample size is determined to match the
    proportion of the original sample size to the
    population sample size (nL/NL).
  • Using the resample size nL, we resample nL
    units (SRSWOR) from each stratum L.
  • Repeat previous step kL times with replacement to
    obtain a single bootstrap sample for stratum L.
  • Repeat a large number times

15
Mirror-Match Variable Definitions
16
Mirror Match Disadvantages
  • NL must be known
  • kL is often non-integer
  • Must bracket between integers when kL is
    non-integer
  • Computing time

17
Estimation of the Population Sampling
Distributions
  • 100,000 independent stratified random samples.
  • Medians computed and plotted to form empirical
    sampling distributions.
  • Variables house value, cable price, and TV hours.

18
Estimation of the Population Sampling
Distributions
19
Simulations
  • Matlab code General, BWO, and Mirror-match.
  • Two independent stratified random samples from
    Lockhart City.
  • Comparison of the sample bootstrap sampling
    distributions with the population sampling
    distributions.
  • 95 confidence intervals were determined
    bootstrap 2.5 and 97.5 percentiles.

20
Sampling Distributions 1
21
Sampling Distributions 2
22
Confidence Intervals
23
The Empirical verses the Bootstrap Sampling
Distributions
  • Bootstrap sampling distributions are expected to
    mimic actual sampling distributions.
  • Bootstrap sampling is sensitive to individual
    samples.
  • The shape of bootstrap sampling distributions may
    vary, but the statistic of interest and its
    variance are considered accurate.

24
Comparison of Bootstrap Methods
25
Empirical Coverages
  • The empirical coverages were close to the
    expected 95. They differed very little between
    the different bootstrap procedures.

26
Empirical Coverages
  • Empirical coverages are dependent on the type of
    confidence interval that was originally selected.
  • Our confidence intervals were calculated from the
    2.5 and 97.5 percentiles of each bootstrap
    distribution.
  • There are many different types of bootstrap
    confidence intervals. The one we selected,
    although intuitive in design, is considered
    generally biased (Bedrick 2006).

27
Computer Processing Times
  • Computer processing times varied greatly.
  • Mean processing time per sample in seconds.

28
Computer Processing Times
  • BWO took 381 times as long as general
    bootstrapping procedures.
  • Mirror-match took 293 times as long as general
    bootstrapping procedures.
  • For our study, the BWO and mirror-match conferred
    no advantage over general bootstrapping with
    regard to statistical estimates. However, their
    vastly greater processing times are a great
    disadvantage.

29
CONCLUSIONS General Bootstrap verses BWO and
Mirror-Match
  • BWO and Mirror-match procedures are designed to
    mimic complex sampling designs.
  • We only analyzed stratified samples of 200 from a
    fictitious city.
  • BWO and Mirror-match methods may be advantageous
    in other complex sampling scenarios.
Write a Comment
User Comments (0)
About PowerShow.com