No CLT - PowerPoint PPT Presentation

About This Presentation

Title:

No CLT

Description:

No CLT No Problem Enter the Bootstrap – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 38

Provided by: karlb1

Learn more at: https://www.biostat.jhsph.edu

Category:

Tags: clt | normal

more less

Transcript and Presenter's Notes

Title: No CLT

1
No CLT No Problem?Enter the Bootstrap!

John McGready
Department of Biostatistics
Johns Hopkins University
http//www.biostat.jhsph.edu/jmcgread

2
Slide 2
3
Goals of Inferential Statistics

Much of what we do in statistics involves trying
to talk about true characteristics of a process,
using an imperfect subset of information from the
process

Population Information (what we WANT)
Sample Information (what we have)
4
Medical Expenditures

Suppose we want to study the FY 2005 medical
expenditures for 13,000 employees in a
particular company
However, the benefits administrator will only
give us one random sample of 200 employees

5
Medical Expenditures
(True) mean 2.3 (True) sd 5.0
Median 0.59, Mean 2.3, sd 5.0
(Sample) mean 1.9 (Sample) sd 4.0
Median 0.57, Mean 2.0, sd 4.3
6
Medical Expenditures

Given the right skew, our first choice for
estimating the center of the distribution is to
work with the median
We can only estimate the true median using the
sample median from our 200 observations

7
Medical Expenditures

We are interested in how good a guess the
sample median is of the true median
We would also like to estimate a range of
possibilities for the true median (ie a
confidence interval)

8
Medical Expenditures

In order to understand how a sample median from
200 observations relates to the true mean, lets
call our administrator and see if we can get
1,000 more random samples of size 200
This way, we can compute 1,000 more sample
medians and see how variable they are

9
Making the Call
10
The Response
No Way!
11
What to Do Now??

Well, it seems we are out of luck
Lets just estimate the mean instead, and use the
Central Limit Theorem to estimate a range of
possible values for the true mean

12
Review Sampling Behavior via the CLT
Standard error (spread)
13
Sampling Behavior via the CLT

Most (95) of the sample means we could get from
samples of 200 would fall between the 2.5th and
97.5 of this distribution
These percentiles correspond to true mean /-
1.96 standard errors

14
Sampling Behavior via the CLT
15
Sampling Behavior via the CLT

Rub 1
If we knew the true mean, we wouldnt care about
possible mean values
However, taking this one step further implies
that 95 of the samples we could get will fall
within a know range of the truth

16
Sampling Behavior via the CLT
17
Sampling Behavior via the CLT
18
Sampling Behavior via the CLT

Rub 2
If we only have one sample, we dont know true
sampling distribution
However, CLT says it will be normal
We spread from our sample data, and center it at
our sample mean

19
Sampling Behavior via the CLT

Our Sample info
Sample mean 2.0 (thousand )
Sample standard deviation 4.3 (thousand )
Sample estimate of standard error (spread of
sampling distribution
(thousand )

20
Sampling Behavior via the CLT
21
Sampling Behavior via the CLT
22
Sampling Behavior via the CLT

True 95 CI
Sample mean /- 1.96(true standard error)
(1.3,2.7)
Estimated 95 CI
Sample mean /- 1.97(estimated standard error)
(1.4, 2.6)

23
Another Approach to Estimating Sampling
Distribution

Instead of relying on CLT, how about we simulate
sampling distribution using just our sample of
200?
Treat our sample as truth
Resample multiple times (say 1000) taking random
draws of 200 with replacement

24
Resampling With Replacement