Inferential Statistics Part 1 - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Inferential Statistics Part 1

Description:

Inferential Statistics Part 1 Chapter 8 P. 253- 278 Collecting a random sample Goal: to understand characteristics about a population Examples: What s the average ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 40
Provided by: uncEducou9
Category:

less

Transcript and Presenter's Notes

Title: Inferential Statistics Part 1


1
Inferential Statistics Part 1
  • Chapter 8
  • P. 253- 278

2
Collecting a random sample
  • Goal to understand characteristics about a
    population
  • Examples
  • Whats the average commuting time for city
    residents?
  • Whats the average household income of the
    patrons of a particular grocery store?
  • Whats the average leaf size size of birch trees
    on August 1 in a particular state park?
  • What proportion of people in a particular
    tropical city have had malaria?

3
Estimating the mean
  • One of the most common goals of statistical
    inference is estimating a population mean with a
    sample mean

4
Central Limit Theorem
  • When we have n independent, identically
    distributed (X1..Xn) random variables, the mean
    of those random variables approaches a normal
    distribution with mean µ and variance ,
    as n gets large.
  • Independence of random variables means that the
    value of one observation has no effect on the
    value of another observation.
  • Identical distribution of random variables means
    that each random variable comes from the same
    population (e.g., roll of a die, coin flip).

5
Simple random sampling
  • Each observation drawn does not depend on others
    drawn
  • Thus observations are independent
  • Each observation (i.e., each random variable) is
    identically distributed
  • The population has a distribution that doesnt
    change (each observation is randomly drawn from
    an identical distribution the distribution of
    the population).
  • So the Central Limit Theorem applies! (when n is
    large)

6
What does this mean?
Suppose we take a sample of n50 observations
from a population that has this distribution
frequency
0
10
20
30
Mean (µ) 20
2
Variance ( ) 100 Std. dev ( ) 10
We then find the mean of this sample (suppose
this mean 19). Take another sample of 50
observations and find the mean (suppose its 24).
Do this many times, and well come up with a
distribution of means. The Central Limit Theorem
tells us this distribution will always look like
the next slide (as long as n is large, and 50
is large enough)
7
The normal curve
20
24
16
18
22
Mean (µ) 20 Sample size (n) 50
variance of sample mean 2
8
Symbols
  • Population Parameter
  • Estimate
  • Expected

9
Basic Types of Inference
  • Point Inference
  • The value of a population parameter is
    estimated using a single value
  • Examples mean, standard deviation, etc.
  • Interval Inference
  • Attaching a probability to an estimate (i.e.,
    making a confidence interval)
  • Example we are 95 confident that µ is between
    10 and 20

10
Judging the Quality of the Estimator
  • Bias the difference between and
    (i.e., )
  • Bias may be positive or negative (e.g., a
    positively biased estimator would indicate the
    population parameter is higher than it actually
    is)
  • Efficiency how clustered the distribution of
    is (i.e., how peaked is its distribution)

11
Judging the Quality of the Estimator
  • Best case scenario to have an unbiased
    estimator, with a high level of efficiency
  • We can measure the quality of the estimator using
    the Mean Squared Error (MSE) or its counterpart
    RMSE (the square root of the MSE)
  • Remember that the variance in this case it the
    variance of a random variable so we use the
    equation

12
Point Estimates (inferring population parameters
from samples)
  • Population Mean
  • Population Proportions
  • Population Variance
  • Population Standard Deviation

13
Confidence Intervals
  • The degree of confidence we have in our estimates
    defined by a percentage
  • Common examples 90, 95, or 99 confident
  • The confidence interval is defined with the a
    symbol
  • In confidence intervals, alpha (a) is the
    proportion of time your confidence interval is
    wrong
  • The typical usage is
  • Why do we divide by 2?

14
Confidence Interval Example
  • What is the 95 confidence interval for a
    normally distributed variable?
  • a 1 - desired confidence interval
  • a 1 0.95 0.05
  • Remember that we divide a by 2 since we have
    uncertainty both above and below the mean (i.e.,
    2 tails)
  • Therefore we use z0.025 for the 95 confidence
    interval
  • From the z-table we find that z0.025 1.96
  • What does this mean?

15
Interval Estimation (making confidence intervals
for population parameters estimated from samples)
  • Case 1 estimating an interval for µ when X is
    normally distributed and we know s
  • This is the simplest case because normality
    allows us to use the z-table
  • This is also unlikely since it requires knowing
    the distribution and the s (which implies knowing
    µ already)

16
Example 1 Create a confidence interval for µ
  • A town is considering building a new bridge over
    a river. The primary goal is to reduce workers
    commute times from a particular community. A
    random sample of workers in that community are
    asked to estimate their reduction in commute time
    if the bridge were built. Our goal is to
    estimate the mean reduction in commute time for
    the whole community if the bridge were built.
    Create a 95 confidence interval for this mean.

17
Example 1 Data
  • n 100 workers are sampled
  • x 17 minutes
  • s 30 minutes
  • What is the 95 confidence interval for the mean?

18
Constructing a confidence interval
  • Construct a 95 confidence interval around the
    sample mean
  • So we can say that the 95 C.I. is 17 /- 5.88 or
    11.12, 22.88

19
Example 1 Questions
  • What would happen to our interval if we used a
    99 confidence interval instead?
  • What would happen to our confidence interval if
    we sampled 200 people instead of 100 people?

20
Interval Estimation (making confidence intervals
for population parameters estimated from samples)
  • Case 2 estimating an interval for µ when X is
    not normally distributed and we know s
  • In this case the n matters a lot, why?
  • This is also unlikely since it requires knowing
    the distribution and the s (which implies knowing
    µ already)

21
Interval Estimation (making confidence intervals
for population parameters estimated from samples)
  • Case 3 estimating an interval for µ when s and
    the distribution are unknown
  • What should we used instead of s?
  • Can we use the z-table in this case?
  • This case is what we see most commonly

22
t-distribution vs. z-distribution
  • When we only have s (and not s) we use the
    t-distribution rather than the z-distribution
  • To do so we use the t-table
  • How are they different?
  • The t-distribution changes depending on the
    degrees of freedom (n-1)
  • This is reflected in the table and in the symbol
  • The t-distribution accounts for more uncertainty
    (i.e., wider confidence intervals) since s is
    just an estimate for s

23
t-distribution vs. z-distribution
  • As n approaches infinity t and z become equal
  • This means that even when we have s instead of s
    we can use the z-distribution if n is large
  • Central Limit Theorem as n gets large.
  • What is large?
  • Rule of thumb 30
  • For n less than 30, the distribution of x does
    not follow the normal distribution accurately
    enough.
  • But the distribution of x does closely follow a
    t-distribution for sample sizes of less than 30.
  • For this class use the t-distribution any time
    you have s instead of s

24
Example 2
  • n 16
  • x 30
  • s2 1600
  • What is the 95 C.I. for the mean?

25
Example 2
  • s 40
  • Degrees of freedom n 1 15

  • (from the t-table)
  • The 95 confidence interval for the mean is
    (8.69, 51.31)

26
Interval Estimation (making confidence intervals
for population parameters estimated from samples)
  • Case 4 estimating an interval for a proportion p
    based on a sample proportion p
  • Remember that p x/n
  • In other word, p the number of successes
    divided by the number of samples
  • For example the proportion of people over 6ft
    tall
  • In this case we dont need s or s, but we do need
    the standard deviation of p
  • Which we estimate as

27
Interval Estimation (making confidence intervals
for population parameters estimated from samples)
  • Case 4 continued
  • Equation
  • We use the z-distribution for estimating an
    interval for a proportion p based on a sample
    proportion p
  • This also limits us to using only large samples
    (in this case n gt 100)
  • For smaller samples, we calculate the entire
    distribution using the binomial mass function
    (i.e., solve for all
    x values)

28
Example 3
  • n 150 people at a convention
  • 63 people sampled were over 6 feet tall
  • What is the 99 C.I. for the true proportion of
    all people 6 ft tall at the convention?

29
Example 3
  • p 63/150 0.42
  • 99 C.I. -gt (from the z-table)
  • The 99 confidence interval for p 0.42 is
    (0.316, 0.524)

30
Sample Size Determination
  • Often, before we conduct a sample, we want to
    know how large of a sample we need
  • Required sample sizes can be determined for
    population parameters (mean, proportions, etc.)
    by modifying the equations weve been going
    through
  • An additional component is the error (E)
  • This is basically the term that defines how far
    off we are willing to be (i.e., the margin of
    acceptable error)
  • Strictly speaking, E is one-half the difference
    between the upper and lower values for an
    interval for a given C.I.
  • Note that E is not the same as C.I.

31
Sample Size Determination
  • Equation for µ
  • Equation for p
  • What obvious flaw do you see?

32
Example 4
  • A movie theatre wants to know the mean number of
    tickets sold per day. How many days must they
    count to know the mean daily ticket sales within
    100 tickets with a 95 confidence interval?
  • From previous sales reports, it is determined
    that s 175

33
Example 4
  • What numbers do we plug into our equation?
  • What should zalpha/2 be?
  • What should E be?
  • Why dont we multiply this by 2?
  • What should s be?

34
Example 4
  • z 1.96
  • E 100
  • s 175
  • n number of days we should sample

35
Example 5
  • A city council election is being held with
    several candidates expecting reasonably large
    returns.
  • To avoid a run-off between the top 2 vote
    getters, the leading candidate must receive at
    least 45 of the vote
  • How many people do we need to sample using exit
    polls to determine with 99 confidence and an
    acceptable error of 0.005 whether there will be a
    run-off vote?

36
Example 5
  • z 2.58
  • E 0.005
  • p 0.45
  • n number of people we should sample

37
Class Problem
  • Given this sample of middle school kid heights
    (in inches)
  • 56, 64, 52, 69, 66, 64, 63, 46, 46, 49, 47, 60,
    54, 45, 45, 69, 62, 67, 49, 43, 59
  • What is the 99 confidence interval for the
    population mean (µ)?

38
Solution
  • n 21
  • x 1175/21 55.95
  • s 8.96
  • talpha/2 , n-1 2.845
  • So the 99 C.I. for the population mean (µ) is
    50.387, 61.513

39
For Friday
  • Come with questions about homework 6

For Monday
  • Read chapter 9 pages 280-306
Write a Comment
User Comments (0)
About PowerShow.com