Stats 120A - PowerPoint PPT Presentation

About This Presentation
Title:

Stats 120A

Description:

Stats 120A Review of CIs, hypothesis tests and more Sample/Population Last time we collected height/armspan data. Is this a sample or a population? – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 36
Provided by: RobG96
Learn more at: http://www.stat.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Stats 120A


1
Stats 120A
  • Review of CIs, hypothesis tests and more

2
Sample/Population
  • Last time we collected height/armspan data. Is
    this a sample or a population?

3
Gallup Poll, 1/9/07
  • "As you may know, the Bush administration is
    considering a temporary but significant increase
    in the number of U.S. troops in Iraq to help
    stabilize the situation there. Would you favor or
    oppose this?"

4
Results
  • Results based on 1004 randomly selected adults (gt
    18 years) interviewed Jan 5-7, 2007.
  • 61 are opposed.
  • "For results based on this sample, one can say
    with 95 confidence that the maximum error
    attributable to sampling and other random effects
    is 3 percentage points. "

5
Pop Quiz
  • Is the value 61 a statistic or a parameter?
  • The margin of error is given as 3. What does
    the margin of error measure? a) the variability
    in the sample
  • b) the variability in the population
  • c) the variability in repeated sampling

6
Sampling paradigm
  • In the U.S., the proportion of adults who are
    opposed to a surge is p, (or p100).
  • We take a random sample of n 1004.
  • The proportion of our sample ("p hat") is an
    estimate of the proportion in the population.

7
A simulation
  • Choose a value to serve as p (say p .6)
  • Our "data" consist of 1004 numbers 0's represent
    those in favor, 1's are those opposed.
  • x 589 out of 1004 say "opposed", so p-hat
    589/1004 .5866
  • mean(x) .5866
  • sd(x) .4926

8
xbar.5866, s .493
9
How do we know sample proportion is a good
estimate of population proportion?
  • Law of Large Numbers
  • sample averages (and proportions) converge on
    population values
  • implying that for finite values, the sample
    proportion might be close if the sample size is
    large

10
Coin flips sample proportion "settles down" to
0.5
11
So if we stop earlier, say n 10
p-hat .60
12
Which raises the question
  • If we stop early, how far away will our sample
    proportion be from the true value?
  • Or, in a survey setting, if we take a finite
    sample of n1004, how far off from the population
    proportion are we likely to be?

13
A simulation might help
  • Assume p .60 (population proportion)
  • Take sample of n 1004 and find p-hat.
  • Save this value
  • Repeat above 3 steps 10000 times.

14
The R code (for the record)
  • phat lt- c()
  • for (i in 110000)
  • x lt- sample(c(0,1),1004,replaceT,probc(.4,
    .6))
  • temp lt- sum(x)/1004
  • phat lt- c(phat,temp)
  • hist(phat)

15
each dot represents one survey of 1004 people
16
10,000 sample proportions, n 1004
17
Observe that...
  • sample proportions are centered on the true
    population value p .60
  • variability is not great smallest is .54,
    biggest is .66
  • distribution is bell-shaped

18
We've just witnessed the Central Limit Theorem
  • If samples are independent and random and
    sufficiently large
  • means (and proportions) follow a nearly Normal
    distribution
  • the mean of the Normal is the mean of the
    population
  • the SD of the Normal (aka the standard error) is
    the population SD divided by sqrt(n)

19
CLT applied to sample proportions
  • phat is distributed with an approx Normal
  • mean is p
  • SE is sqrt(p(1-p)/n)
  • For our simulation, p .60 so our p-hats will be
    centered on .6 with a SD of sqrt(.6.4/1004)
    0.0155

20
We saw
  • Normal
  • mean(phat) 0.600(expected .6)
  • sd(phat) 0.01554(expected 0.0155)

21
In practice, we don't know p
  • but we can get a good approximation to the
    standard error using
  • sqrt(phat (1-phat)/n)
  • rather than
  • sqrt(p(1-p)/n)

22
So if we take a random sample of n 1004
  • and we see p-hat .61, we know that
  • The true value of p can't be far away.
  • SE sqrt(.61.39/1004) 0.0154
  • So 68 of the time we do this, p will be within
    0.0154 of phat
  • And 95 of the time it will be with 2.0154
    0.03

23
Which leads us to conclude
  • that the true proportion of the population that
    opposes a surge is somewhere in the interval.61
    - .03 0.58
  • to .61.03 0.64

24
Confidence intervals
  • This is an example of a 95 confidence interval.
  • Because 95 of all samples will produce a p-hat
    that is within 2 standard errors of the true
    value, we are 95 confident that ours is a "good"
    interval.

25
Formula
  • A 95 CI for a proportion is
  • estimate /- 2 (Standard Error)
  • p-hat /- 2sqrt(phat(1-phat)/n)
  • 0.61 /- 2sqrt(.61.39/1004)
  • (.58, .64)
  • note our replacing phat for p in SE means we get
    an approximate value

26
What does 95 mean?
  • If we repeat this infinitely many times
  • take a sample of n 1004 from population
  • calculate sample proportion
  • find an interval using /- 2 SE
  • then 95 of these CIs will contain the truth and
    5 will not.
  • We see only one (.58, .64). It is either good
    or bad, but we are confident it is good.

27
Where did the 95 come from?
  • It came from the normal curve.
  • The CLT told us that p-hat followed a (approx)
    normal distribution.
  • For Normal's, 68 of probability is within 1
    standard deviation of mean, 95 within 2, 99.7
    within 3.
  • A normal table gives other probabilities

28
Change confidence level by changing the width of
margin of error
.015
-0.015
1 SE
68
2 SEs
95
3 SEs
99.7
90
1.6 SE
phat 0.61
29
The CLT applies to
  • any linear combination of the observations
  • assuming observations are randomly sampled, and
    independent
  • it does NOT matter what the distribution of the
    population looks like
  • if n is small, the distribution will be only
    approximately normal, and this might be a very
    poor approximation

30
the CLT does NOT apply to
  • non-linear combinations, such as the sample
    median or the standard deviation
  • non-random samples
  • samples that are dependent

31
simulation
  • http//onlinestatbook.com/stat_sim/sampling_dist/i
    ndex.html

32
Summary
  • Confidence Level is a statement about the
    sampling process, not the sample
  • Margin of error is determined to achieve the
    desired confidence level
  • We can calculate the confidence level only if we
    know the sampling distribution the probability
    distribution of the sample

33
Pop Quiz
  • Is the value 61 a statistic or a parameter?
  • The margin of error is given as 3. What does
    the margin of error measure? a) the variability
    in the sample
  • b) the variability in the population
  • c) the variability in repeated sampling

34
Pop Quiz
  • Is the value 61 a statistic or a parameter?
  • The margin of error is given as 3. What does
    the margin of error measure? a) the variability
    in the sample
  • b) the variability in the population
  • c) the variability in repeated sampling

35
For next time
  • In WWII, German army produced tanks with
    sequential serial numbers. The allies captured a
    few tanks, and wanted to infer the total number
    of tanks produced.
  • Suppose you had captured 10 tanks. Come up with
    three estimators for the total number of tanks.
  • Data 911 5146 6083 944 11944 9365 6087
    6647 7076 12275
Write a Comment
User Comments (0)
About PowerShow.com