Title: STAT 572: Bootstrap Project
1STAT 572 Bootstrap Project
- Group Members
- Cindy Bothwell
- Erik Barry Erhardt
- Nina Greenberg
- Casey Richardson
- Zachary Taylor
2Histograms of Complex Population Distribution
3Histograms of Population Sampling Distribution of
the Median and Estimated Bootstrap Sampling
Distributions
4What is a Bootstrap
- A method of Resampling creating many samples
from a single sample - Generally, resampling is done with replacement
- Used to develop a sampling distribution of
statistics such as mean, median, proportion,
others.
5The Bootstrap and Complex Surveys
- Number of bootstrap samples
- n sample size, N population size
- Possible resamples nn (example n200,
2002001.6x10460) - Too many possibilities N!/n!(N-n)!, limit to B
a large number, (example 1000) - the Monte
Carlo approximation - Determine sampling distribution with parameters
- Calculate variance in the normal way
6Advantages and Disadvantages
- Advantages
- Avoids the costs of taking new samples (Estimate
a sampling distribution when only one sample is
available) - Checking parametric assumptions
- Used when parametric assumptions cannot be made
or are very complicated - Estimation of variance in quantiles
- Disadvantages
- Relies on a representative sample
- Variability due to finite replications (Monte
Carlo)
7Computations
- With more computing power available, bootstrap is
possible for a large number of resamples - Possible programs
- Matlab
- Minitab
- SAS
- Excel
- S-Plus
- SPSS
- Fathom
8Bootstrap using SURVEY program
- Main parameter of interest is the median price
that all households in Lockhart City are wiling
to pay for cable. - The price that a household is willing to pay for
cable is positively correlated with
average-district house value. - Districts in Lockhart City are divided into
strata based on average house value. - Estimate the variance and create 95 CI
9Lockhart City Strata Characteristics
- Take a stratified random sample of size 200 using
proportional allocation. - Using the stratified random sample, implement the
general bootstrap procedure, BWO, and
mirror-match.
10Variations of the Bootstrap in Strata
- General Bootstrap
- Mimic the original sampling method
- BWO Bootstrap Without Replacement
- Grow the sample to the size of the population
- Mirror-Match
- Repeated miniature resamples
11BWO Bootstrap Without Replacement
- Grow the sample to the size of the population
- For each stratum L, create a pseudo-population by
replicating the sample kL times. - Resample nL units from each stratum without
replacement to obtain a single bootstrap sample
for stratum L. - Repeat a large number of times
12BWO Variable Definitions
13Disadvantages of extended BWO
- NL must be known
- nL and kL are often non-integers
- Must bracket between integers if nL and kL are
non-integer - Computing time
14Mirror-Match
- Repeated miniature resamples
- Resample size is determined to match the
proportion of the original sample size to the
population sample size (nL/NL). - Using the resample size nL, we resample nL
units (SRSWOR) from each stratum L. - Repeat previous step kL times with replacement to
obtain a single bootstrap sample for stratum L. - Repeat a large number times
15Mirror-Match Variable Definitions
16Mirror Match Disadvantages
- NL must be known
- kL is often non-integer
- Must bracket between integers when kL is
non-integer - Computing time
17Estimation of the Population Sampling
Distributions
- 100,000 independent stratified random samples.
- Medians computed and plotted to form empirical
sampling distributions. - Variables house value, cable price, and TV hours.
18Estimation of the Population Sampling
Distributions
19Simulations
- Matlab code General, BWO, and Mirror-match.
- Two independent stratified random samples from
Lockhart City. - Comparison of the sample bootstrap sampling
distributions with the population sampling
distributions. - 95 confidence intervals were determined
bootstrap 2.5 and 97.5 percentiles.
20Sampling Distributions 1
21Sampling Distributions 2
22Confidence Intervals
23The Empirical verses the Bootstrap Sampling
Distributions
- Bootstrap sampling distributions are expected to
mimic actual sampling distributions. - Bootstrap sampling is sensitive to individual
samples. - The shape of bootstrap sampling distributions may
vary, but the statistic of interest and its
variance are considered accurate.
24Comparison of Bootstrap Methods
25Empirical Coverages
- The empirical coverages were close to the
expected 95. They differed very little between
the different bootstrap procedures.
26Empirical Coverages
- Empirical coverages are dependent on the type of
confidence interval that was originally selected.
- Our confidence intervals were calculated from the
2.5 and 97.5 percentiles of each bootstrap
distribution. - There are many different types of bootstrap
confidence intervals. The one we selected,
although intuitive in design, is considered
generally biased (Bedrick 2006).
27Computer Processing Times
- Computer processing times varied greatly.
- Mean processing time per sample in seconds.
28Computer Processing Times
- BWO took 381 times as long as general
bootstrapping procedures. - Mirror-match took 293 times as long as general
bootstrapping procedures. - For our study, the BWO and mirror-match conferred
no advantage over general bootstrapping with
regard to statistical estimates. However, their
vastly greater processing times are a great
disadvantage.
29CONCLUSIONS General Bootstrap verses BWO and
Mirror-Match
- BWO and Mirror-match procedures are designed to
mimic complex sampling designs. - We only analyzed stratified samples of 200 from a
fictitious city. - BWO and Mirror-match methods may be advantageous
in other complex sampling scenarios.