No CLT - PowerPoint PPT Presentation

About This Presentation
Title:

No CLT

Description:

No CLT No Problem Enter the Bootstrap – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 38
Provided by: karlb1
Category:
Tags: clt | normal

less

Transcript and Presenter's Notes

Title: No CLT


1
No CLT No Problem?Enter the Bootstrap!
  • John McGready
  • Department of Biostatistics
  • Johns Hopkins University
  • http//www.biostat.jhsph.edu/jmcgread

2
Slide 2
3
Goals of Inferential Statistics
  • Much of what we do in statistics involves trying
    to talk about true characteristics of a process,
    using an imperfect subset of information from the
    process

Population Information (what we WANT)
Sample Information (what we have)
4
Medical Expenditures
  • Suppose we want to study the FY 2005 medical
    expenditures for 13,000 employees in a
    particular company
  • However, the benefits administrator will only
    give us one random sample of 200 employees

5
Medical Expenditures
(True) mean 2.3 (True) sd 5.0
Median 0.59, Mean 2.3, sd 5.0
(Sample) mean 1.9 (Sample) sd 4.0
Median 0.57, Mean 2.0, sd 4.3
6
Medical Expenditures
  • Given the right skew, our first choice for
    estimating the center of the distribution is to
    work with the median
  • We can only estimate the true median using the
    sample median from our 200 observations

7
Medical Expenditures
  • We are interested in how good a guess the
    sample median is of the true median
  • We would also like to estimate a range of
    possibilities for the true median (ie a
    confidence interval)

8
Medical Expenditures
  • In order to understand how a sample median from
    200 observations relates to the true mean, lets
    call our administrator and see if we can get
    1,000 more random samples of size 200
  • This way, we can compute 1,000 more sample
    medians and see how variable they are

9
Making the Call
10
The Response
No Way!
11
What to Do Now??
  • Well, it seems we are out of luck
  • Lets just estimate the mean instead, and use the
    Central Limit Theorem to estimate a range of
    possible values for the true mean

12
Review Sampling Behavior via the CLT
Standard error (spread)
13
Sampling Behavior via the CLT
  • Most (95) of the sample means we could get from
    samples of 200 would fall between the 2.5th and
    97.5 of this distribution
  • These percentiles correspond to true mean /-
    1.96 standard errors

14
Sampling Behavior via the CLT
15
Sampling Behavior via the CLT
  • Rub 1
  • If we knew the true mean, we wouldnt care about
    possible mean values
  • However, taking this one step further implies
    that 95 of the samples we could get will fall
    within a know range of the truth

16
Sampling Behavior via the CLT
17
Sampling Behavior via the CLT
18
Sampling Behavior via the CLT
  • Rub 2
  • If we only have one sample, we dont know true
    sampling distribution
  • However, CLT says it will be normal
  • We spread from our sample data, and center it at
    our sample mean

19
Sampling Behavior via the CLT
  • Our Sample info
  • Sample mean 2.0 (thousand )
  • Sample standard deviation 4.3 (thousand )
  • Sample estimate of standard error (spread of
    sampling distribution
  • (thousand )

20
Sampling Behavior via the CLT
21
Sampling Behavior via the CLT
22
Sampling Behavior via the CLT
  • True 95 CI
  • Sample mean /- 1.96(true standard error)
  • (1.3,2.7)
  • Estimated 95 CI
  • Sample mean /- 1.97(estimated standard error)
  • (1.4, 2.6)

23
Another Approach to Estimating Sampling
Distribution
  • Instead of relying on CLT, how about we simulate
    sampling distribution using just our sample of
    200?
  • Treat our sample as truth
  • Resample multiple times (say 1000) taking random
    draws of 200 with replacement

24
Resampling With Replacement
  • Original sample (n4)
  • Potential resample of same size

S1
S2
S3
S4
S2
S1
S3
25
Re-Sampling
26
Bootstrap Estimate of Sampling Distribution
  • Take 1,000 resamples
  • Compute the mean of each re-sample
  • Plot a distribution of the means

27
Bootstrap Estimate of Sampling Distribution
28
Bootstrap Estimate of Sampling Distribution
29
Bootstrap 95 CIs
  • How to get a 95 CI from the bootstrap dist
  • Assume normality (normal bootstrap method)
  • But estimate standard error from bootstrap
    distribution
  • Pick off 2.5th, 97.5th percentiles (bootstrap
    percentile method)
  • Pick off adjusted percentile (bias-corrected
    acclerated BCa - method)

30
95 CIs
  • True Mean 2.3
  • Method 95 CI
  • CLT Estimate 1.40 - 2.60
  • Bootstrap Normal 1.39 - 2.60
  • Bootstrap Percentile 1.41 - 2.58
  • BCa 1.47 - 2.68

31
We Could Do with 10,000 Resamples
32
Bootstrap 95 CIs Mean
  • Empirical Coverage Probabilities1
  • Method 1K resamps 10K resamps
  • CLT Estimate 2 93.4
  • Bootstrap Normal 2 93.2 92.5
  • Bootstrap Percentile 92.4 91.6
  • BCa 92.3 93.4
  • 1 To be thorough, should also look at average
    width
  • 2 Some intervals could contain illegal (negative)
    values

33
Whats The Big Deal?
  • Why not just use CLT?
  • For many statistics, we do not have a CLT (or
    good CLT) based approach
  • Median
  • Ratio of mean to sd
  • Correlation coefficients

34
Getting a 95 CI for A Median
35
95 CIs For Median
  • True Median 0.59
  • Method 95 CI (1,00 Reps)
  • CLT Estimate NA
  • Bootstrap Normal 0.44 - 0.71
  • Bootstrap Percentile 0.39 - 0.68
  • BCa 0.39 - 0.68

36
Bootstrap 95 CIs Median
  • Empirical Coverage Probabilities1
  • Method 1K resamps 10K resamps
  • Bootstrap Normal2 94.1 94.4
  • Bootstrap Percentile 93.9 95.0
  • BCa 94.0 95.2
  • 1 To be thorough, should also look at average
    width
  • 2 Some intervals could contain illegal (negative)
    values

37
Wrap Up
  • Pros/Cons of boostrap
  • Theoretical Justicifaction
Write a Comment
User Comments (0)
About PowerShow.com