Four major statistical categories - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Four major statistical categories

Description:

... of designer jeans is using advertising to develop an expensive and classy image. ... selling price of houses (y) and LGA (local. government area valuation, ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 43
Provided by: unisanet
Category:

less

Transcript and Presenter's Notes

Title: Four major statistical categories


1
Four major statistical categories
Weeks 8 and 9
2
Week 9 objectives
  • Confidence intervals for population means
  • The t-distribution, and CIs based on t
  • CIs for differences in population means
  • Sample size determination for means
  • CIs for slope in simple linear regression
  • The four components of interval estimation for
    means and slopes
  • Assumptions and Conditions

3
Estimation using results from a sample to
describe the population.
  • A manufacturer of designer jeans is using
    advertising to develop an expensive and classy
    image. The suggested retail price is 85.
  • The manufacturer is concerned that retailers are
    undermining her image by offering the jeans at
    discount prices. She needs to understand better
    what is happening.
  • What is the average selling price of the jeans at
    all retail stores?
  • She randomly samples retailers who sell her
    product and determines their price.
  • A major car manufacturer with plants in a number
    of countries wants to examine productivity in
    Australia and Korea.
  • What is the difference in the average number of
    cars produced between the plants in Australia and
    Korea?
  • The daily output is recorded for one randomly
    selected plant in Australia and a similar-sized
    plant in Korea.

4
Lecture Example 1
Find a 95 CI for the long-run mean petrol
consumption of your car under city driving
conditions, when the average consumption is 8.62
km/litre calculated from a random sample of 31
tanks of petrol that has an estimated standard
deviation 2.39 km/litre.
5
Review
The general confidence interval formula
For the general parameter is, where the
z-value chosen to give desired confidence level
6
1. Application to CIs for a population mean
The parameter of interest is the population mean
. The estimate of the parameter is . The
standard error of (or ) is where s is
the sample standard deviation.
7
Lecture Example 1, continued
8
2. A better CI exact, even for small samples
  • The previous CI for a population mean is
    approximate, and should only be used for large
    sample sizes n
  • There is an alternative form which is accurate
    even for small sample sizes
  • It uses a new distribution, the t-distribution
  • But it relies on the assumption that the data are
    drawn from a normal distribution

9
Standardisation and Studentisation
The t-distribution was 'discovered' by W.S.
Gossett. He wrote under the pen-name of
'Student'. So the t-distribution is often called
'Student's t-distribution'.
is Standardisation is Studentisation
10
What is the t-distribution?
  • The t-distribution looks like the standard normal
    but it has thicker tails
  • This represents more variability for variables
    from a t-distribution
  • In turn, this has been produced in
    standardisation by dividing by a sample standard
    deviation, which is a variable, not a constant
  • See the textbook for more details

11
What are degrees of freedom?
  • Every t-distribution has a parameter k, the
    degrees of freedom
  • This measures the amount of information about
    variance (or standard deviation)
  • When k is large, the t-distribution is very like
    the standard normal
  • Can you see why this makes sense?

12
What is the t-distribution?
pdfs of t-distributions the t pdf is close to
the standard normal z even for 10 degrees of
freedom
  • pdfs of Z, t3 and t10

13
CIs using the t-distribution
Whenever a population standard deviation ? is
replaced by an estimated standard deviation s,
replace z-values by t-values.
If the sample size n is large, then the
value is close to z the estimation of by s
is accurate, so the top formula is OK.
14
What is the value in this CI
expression?
(It helps to draw a diagram)
Because the t-distribution has more variability
than the standard normal z, these critical values
will exceed 1.96. But if n is large, they will
be close to 1.96
15
Calc gt Probability distributions gt t
distribution, then select degrees of freedom,
probability level and the Inverse Cumulative
Probability option
Critical values of t from Minitab
degrees of 97.5 point freedom k of
t 5 2.5706 10 2.2281 30 2.0423 50 2.00
86 100 1.9840 500 1.9647
As k increases these critical values approach
1.96, the corresponding z-value.
16
When to use z, not t?
  • A convenient rule of thumb is that when the
    degrees of freedom k is gt 30, the t-distribution
    is close enough to the standard normal z to make
    unnecessary the refinement of using t.

17
Vending machine profits
Example (9.3.1) of the textbook
A take-away store installs a vending machine.
During the first seven weeks of operation, its
profit figures () are 390, 377, 402, 363,
415, 393 and 387. Estimate and give a 90
confidence interval for the long run mean weekly
profit. Variable N Mean StDev SE
Mean profits 7 389.57 16.75 6.33
18
(from the formula for CIs for a population mean)
Example (9.3.1), continued
19
(Stat gt Basic Statistics gt 1 Sample t, then
select the confidence interval option)
or by Minitab
T Confidence Intervals Variable N Mean StDev
SE Mean 90.0 CI profits 7 389.57 16.75
6.33 (377.27, 401.87)
20
Lecture Exercise 1
  • Find a 95 CI for the long-run mean petrol
    consumption of Example 1 using t distribution.
  • Choose the appropriate t-value from the Minitab
    output below.
  • How does this CI compare to the one obtained
    using z-value?

Student's t distribution with 30 DF P( X lt x )
x 0.9000 1.3104 0.9500
1.6973 0.9750 2.0423 0.9900
2.4573 0.9950 2.7500
21
Solution
We are 95 confident that the average fuel
consumption is between 7.74 and 9.50
km/litre. (Previous answer was 7.78 to 9.46)
22
3. Confidence interval for a difference between
two population means
23
How to find a confidence interval for the
difference between two population means?
24
Part of Minitab output CIs for differences
between means, see Textbook Example (9.4.1)
Which assumes equal variances? Answer the first
25
How to decide between equal and unequal variances?
26
Lecture Exercise 2
  • For the Textbook Example (9.4.1), verify the 90
    CI for the difference between means assuming
    equal variances given by Minitab.
  • The formula for the standard error in this case
    is
  • where s is the pooled StDev from Minitab.

Student's t distribution with 8 DF P( X lt x )
x 0.9000 1.3968 0.9500
1.8595 0.9750 2.3060 0.9900
2.8965 0.9950 3.3554
27
Solution
  • The degrees of freedom is n1 n2 2 5 5
    2 8, which is less than 30, so we use t.
  • For a 90 CI choose t8-value 1.8595.
  • The standard error is
  • Then the CI is
  • and the confidence interval is (-14.5, -3.5).
  • Thus we are 90 confident that for an office
    worker leaving home early, the average travel
    time to work is between 3.5 and 14.5 minutes
    shorter than for an office worker leaving home
    later.

28
4. Sample size determination formula
  • For a confidence interval of accuracy e, and
    given confidence the required sample size is
  • where ?2 is the population variance and z is z
    value for the chosen confidence level.
  • Some prior information about ?2 is essential.

29
Textbook Example (9.6.1)
  • A sample of only 7 observations produced a 90 CI
    for average weekly soft drink machine profit, of
    width about 25.
  • The standard deviation was 16.75.
  • To obtain a more precise estimate of weekly
    profit, ie a 99 CI with accuracy ?5 (width
    10), how many additional observations will be
    needed?

30
Example (9.6.1), continued
31
Lecture Exercise 3
  • What sample size is needed to obtain a 90
    confidence interval for an unknown population
    mean, with accuracy 3 units, where the population
    standard deviation is known from long past
    experience to be around 5.5 units?

32
Solution
33
5. CIs for slope in simple linear regression
34
Minitab output for s.e. of the slope estimate
selling price of houses (y) and LGA (local
government area valuation, x) Units are 000
The standard error of the slope estimate is 0.107
and the sample size is 8.
35
Lecture Example 2
Find a 95 confidence interval for the slope in
the LGA example.
  • Answer The degrees of freedom is 6 which is
    less than 30, so use t. The t-value from Minitab
    is

Student's t distribution with 6 DF P( X lt x )
x 0.9750 2.4469
The CI is b1 ? t6 ? s.e.(b1) i.e. 1.285 ?
2.4469?0.107. So the CI is (1.023, 1.547).
We are 95 confident that the true slope is
between 1.023 and 1.547.
36
Confidence intervals for predictions in simple
linear regression
  • Use Minitab to carry out regression, using
  • Stat gt Regression gt Regression, as usual.
  • Under Options, enter the value of x under
    Prediction interval for new observations.
  • Select the confidence level.
  • Minitab calculates an estimated predicted
    expected value, its standard error, and a
    confidence interval.

37
How to find a 95 CI for the predicted average
house selling price when LGA 140,000 in Example
(5.4.1)?
Lecture Example 3
Follow the steps on the previous slide. Minitab
output is as follows
  • Predicted Values
  • Fit StDev Fit 95.0 CI
  • 185.51 5.83 (171.23, 199.78)

So the required CI is 171,000 to 200,000.
38
7. Conditions and Assumptions for means
  • Well defined continuous variable
  • Representative sample
  • A large sample size or normally distributed
    variable
  • Independence
  • if d.f. gt 30, central limit effects make this
    OK. Or if a sample follows normal distribution,
    this condition is met no matter how small sample
    size is.

39
Some conditions are usually satisfied easily
  • such as the well-defined variables, the
    representative sample, and independence
    conditions
  • Having degrees of freedom gt 30 helps central
    limit effects to make the normality condition OK
  • Having equal sample sizes reduces any bad effects
    of wrongly assuming equal variances, and also
    helps central limit effects to make the normality
    condition OK.

40
6. The four components for interval estimation
  • Examples discussed have concerned means, slopes,
    predictions and differences between means
  • In each case the four components parameter,
    estimate, standard error and confidence interval
    can be identified
  • Sometimes a decision is needed on whether to use
    z or t see whether the degrees of freedom is
    greater or less than 30

41
Summary of four components
  • For the population proportion always use z-value
  • For population mean use t but if d.f. are gt30 you
    can use z

42
Summary Degrees of freedom formulas for t
Write a Comment
User Comments (0)
About PowerShow.com