Stats 120A - PowerPoint PPT Presentation

About This Presentation

Title:

Stats 120A

Description:

Stats 120A Review of CIs, hypothesis tests and more Sample/Population Last time we collected height/armspan data. Is this a sample or a population? – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 36

Provided by: RobG96

Learn more at: http://www.stat.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Stats 120A

1
Stats 120A

Review of CIs, hypothesis tests and more

2
Sample/Population

Last time we collected height/armspan data. Is
this a sample or a population?

3
Gallup Poll, 1/9/07

"As you may know, the Bush administration is
considering a temporary but significant increase
in the number of U.S. troops in Iraq to help
stabilize the situation there. Would you favor or
oppose this?"

4
Results

Results based on 1004 randomly selected adults (gt
18 years) interviewed Jan 5-7, 2007.
61 are opposed.
"For results based on this sample, one can say
with 95 confidence that the maximum error
attributable to sampling and other random effects
is 3 percentage points. "

5
Pop Quiz

Is the value 61 a statistic or a parameter?
The margin of error is given as 3. What does
the margin of error measure? a) the variability
in the sample
b) the variability in the population
c) the variability in repeated sampling

6
Sampling paradigm

In the U.S., the proportion of adults who are
opposed to a surge is p, (or p100).
We take a random sample of n 1004.
The proportion of our sample ("p hat") is an
estimate of the proportion in the population.

7
A simulation

Choose a value to serve as p (say p .6)
Our "data" consist of 1004 numbers 0's represent
those in favor, 1's are those opposed.
x 589 out of 1004 say "opposed", so p-hat
589/1004 .5866
mean(x) .5866
sd(x) .4926

8
xbar.5866, s .493
9
How do we know sample proportion is a good
estimate of population proportion?

Law of Large Numbers
sample averages (and proportions) converge on
population values
implying that for finite values, the sample
proportion might be close if the sample size is
large

10
Coin flips sample proportion "settles down" to
0.5
11
So if we stop earlier, say n 10
p-hat .60
12
Which raises the question

If we stop early, how far away will our sample
proportion be from the true value?
Or, in a survey setting, if we take a finite
sample of n1004, how far off from the population
proportion are we likely to be?

13
A simulation might help

Assume p .60 (population proportion)
Take sample of n 1004 and find p-hat.
Save this value
Repeat above 3 steps 10000 times.

14
The R code (for the record)

phat lt- c()
for (i in 110000)
x lt- sample(c(0,1),1004,replaceT,probc(.4,
.6))
temp lt- sum(x)/1004
phat lt- c(phat,temp)
hist(phat)

15
each dot represents one survey of 1004 people
16
10,000 sample proportions, n 1004
17
Observe that...

sample proportions are centered on the true
population value p .60
variability is not great smallest is .54,
biggest is .66
distribution is bell-shaped

18
We've just witnessed the Central Limit Theorem

If samples are independent and random and
sufficiently large
means (and proportions) follow a nearly Normal
distribution
the mean of the Normal is the mean of the
population
the SD of the Normal (aka the standard error) is
the population SD divided by sqrt(n)

19
CLT applied to sample proportions

phat is distributed with an approx Normal
mean is p
SE is sqrt(p(1-p)/n)
For our simulation, p .60 so our p-hats will be
centered on .6 with a SD of sqrt(.6.4/1004)
0.0155

20
We saw

Normal
mean(phat) 0.600(expected .6)
sd(phat) 0.01554(expected 0.0155)

21
In practice, we don't know p

but we can get a good approximation to the
standard error using
sqrt(phat (1-phat)/n)
rather than
sqrt(p(1-p)/n)

22
So if we take a random sample of n 1004

and we see p-hat .61, we know that
The true value of p can't be far away.
SE sqrt(.61.39/1004) 0.0154
So 68 of the time we do this, p will be within
0.0154 of phat
And 95 of the time it will be with 2.0154
0.03

23
Which leads us to conclude

that the true proportion of the population that
opposes a surge is somewhere in the interval.61
- .03 0.58
to .61.03 0.64

24
Confidence intervals

This is an example of a 95 confidence interval.
Because 95 of all samples will produce a p-hat
that is within 2 standard errors of the true
value, we are 95 confident that ours is a "good"
interval.

25
Formula

A 95 CI for a proportion is
estimate /- 2 (Standard Error)
p-hat /- 2sqrt(phat(1-phat)/n)
0.61 /- 2sqrt(.61.39/1004)
(.58, .64)
note our replacing phat for p in SE means we get
an approximate value

26
What does 95 mean?

If we repeat this infinitely many times
take a sample of n 1004 from population
calculate sample proportion
find an interval using /- 2 SE
then 95 of these CIs will contain the truth and
5 will not.
We see only one (.58, .64). It is either good
or bad, but we are confident it is good.

27
Where did the 95 come from?

It came from the normal curve.
The CLT told us that p-hat followed a (approx)
normal distribution.
For Normal's, 68 of probability is within 1
standard deviation of mean, 95 within 2, 99.7
within 3.
A normal table gives other probabilities

28
Change confidence level by changing the width of
margin of error
.015
-0.015
1 SE
68
2 SEs
95
3 SEs
99.7
90
1.6 SE
phat 0.61
29
The CLT applies to

any linear combination of the observations
assuming observations are randomly sampled, and
independent
it does NOT matter what the distribution of the
population looks like
if n is small, the distribution will be only
approximately normal, and this might be a very
poor approximation

30
the CLT does NOT apply to

non-linear combinations, such as the sample
median or the standard deviation
non-random samples
samples that are dependent

31
simulation

http//onlinestatbook.com/stat_sim/sampling_dist/i
ndex.html

32
Summary

Confidence Level is a statement about the
sampling process, not the sample
Margin of error is determined to achieve the
desired confidence level
We can calculate the confidence level only if we
know the sampling distribution the probability
distribution of the sample

33
Pop Quiz

Is the value 61 a statistic or a parameter?
The margin of error is given as 3. What does
the margin of error measure? a) the variability
in the sample
b) the variability in the population
c) the variability in repeated sampling

34
Pop Quiz

Is the value 61 a statistic or a parameter?
The margin of error is given as 3. What does
the margin of error measure? a) the variability
in the sample
b) the variability in the population
c) the variability in repeated sampling

35
For next time

In WWII, German army produced tanks with
sequential serial numbers. The allies captured a
few tanks, and wanted to infer the total number
of tanks produced.
Suppose you had captured 10 tanks. Come up with
three estimators for the total number of tanks.
Data 911 5146 6083 944 11944 9365 6087
6647 7076 12275

Write a Comment

User Comments (0)