Title: Four major statistical categories
1Four major statistical categories
Weeks 8 and 9
2Week 9 objectives
- Confidence intervals for population means
- The t-distribution, and CIs based on t
- CIs for differences in population means
- Sample size determination for means
- CIs for slope in simple linear regression
- The four components of interval estimation for
means and slopes - Assumptions and Conditions
3Estimation using results from a sample to
describe the population.
- A manufacturer of designer jeans is using
advertising to develop an expensive and classy
image. The suggested retail price is 85. - The manufacturer is concerned that retailers are
undermining her image by offering the jeans at
discount prices. She needs to understand better
what is happening. - What is the average selling price of the jeans at
all retail stores? - She randomly samples retailers who sell her
product and determines their price. - A major car manufacturer with plants in a number
of countries wants to examine productivity in
Australia and Korea. - What is the difference in the average number of
cars produced between the plants in Australia and
Korea? - The daily output is recorded for one randomly
selected plant in Australia and a similar-sized
plant in Korea.
4Lecture Example 1
Find a 95 CI for the long-run mean petrol
consumption of your car under city driving
conditions, when the average consumption is 8.62
km/litre calculated from a random sample of 31
tanks of petrol that has an estimated standard
deviation 2.39 km/litre.
5Review
The general confidence interval formula
For the general parameter is, where the
z-value chosen to give desired confidence level
61. Application to CIs for a population mean
The parameter of interest is the population mean
. The estimate of the parameter is . The
standard error of (or ) is where s is
the sample standard deviation.
7Lecture Example 1, continued
82. A better CI exact, even for small samples
- The previous CI for a population mean is
approximate, and should only be used for large
sample sizes n - There is an alternative form which is accurate
even for small sample sizes - It uses a new distribution, the t-distribution
- But it relies on the assumption that the data are
drawn from a normal distribution
9Standardisation and Studentisation
The t-distribution was 'discovered' by W.S.
Gossett. He wrote under the pen-name of
'Student'. So the t-distribution is often called
'Student's t-distribution'.
is Standardisation is Studentisation
10What is the t-distribution?
- The t-distribution looks like the standard normal
but it has thicker tails - This represents more variability for variables
from a t-distribution - In turn, this has been produced in
standardisation by dividing by a sample standard
deviation, which is a variable, not a constant - See the textbook for more details
11What are degrees of freedom?
- Every t-distribution has a parameter k, the
degrees of freedom - This measures the amount of information about
variance (or standard deviation) - When k is large, the t-distribution is very like
the standard normal - Can you see why this makes sense?
12What is the t-distribution?
pdfs of t-distributions the t pdf is close to
the standard normal z even for 10 degrees of
freedom
13CIs using the t-distribution
Whenever a population standard deviation ? is
replaced by an estimated standard deviation s,
replace z-values by t-values.
If the sample size n is large, then the
value is close to z the estimation of by s
is accurate, so the top formula is OK.
14What is the value in this CI
expression?
(It helps to draw a diagram)
Because the t-distribution has more variability
than the standard normal z, these critical values
will exceed 1.96. But if n is large, they will
be close to 1.96
15Calc gt Probability distributions gt t
distribution, then select degrees of freedom,
probability level and the Inverse Cumulative
Probability option
Critical values of t from Minitab
degrees of 97.5 point freedom k of
t 5 2.5706 10 2.2281 30 2.0423 50 2.00
86 100 1.9840 500 1.9647
As k increases these critical values approach
1.96, the corresponding z-value.
16When to use z, not t?
- A convenient rule of thumb is that when the
degrees of freedom k is gt 30, the t-distribution
is close enough to the standard normal z to make
unnecessary the refinement of using t.
17Vending machine profits
Example (9.3.1) of the textbook
A take-away store installs a vending machine.
During the first seven weeks of operation, its
profit figures () are 390, 377, 402, 363,
415, 393 and 387. Estimate and give a 90
confidence interval for the long run mean weekly
profit. Variable N Mean StDev SE
Mean profits 7 389.57 16.75 6.33
18(from the formula for CIs for a population mean)
Example (9.3.1), continued
19(Stat gt Basic Statistics gt 1 Sample t, then
select the confidence interval option)
or by Minitab
T Confidence Intervals Variable N Mean StDev
SE Mean 90.0 CI profits 7 389.57 16.75
6.33 (377.27, 401.87)
20Lecture Exercise 1
- Find a 95 CI for the long-run mean petrol
consumption of Example 1 using t distribution. - Choose the appropriate t-value from the Minitab
output below. - How does this CI compare to the one obtained
using z-value?
Student's t distribution with 30 DF P( X lt x )
x 0.9000 1.3104 0.9500
1.6973 0.9750 2.0423 0.9900
2.4573 0.9950 2.7500
21Solution
We are 95 confident that the average fuel
consumption is between 7.74 and 9.50
km/litre. (Previous answer was 7.78 to 9.46)
223. Confidence interval for a difference between
two population means
23How to find a confidence interval for the
difference between two population means?
24Part of Minitab output CIs for differences
between means, see Textbook Example (9.4.1)
Which assumes equal variances? Answer the first
25How to decide between equal and unequal variances?
26Lecture Exercise 2
- For the Textbook Example (9.4.1), verify the 90
CI for the difference between means assuming
equal variances given by Minitab. - The formula for the standard error in this case
is -
- where s is the pooled StDev from Minitab.
Student's t distribution with 8 DF P( X lt x )
x 0.9000 1.3968 0.9500
1.8595 0.9750 2.3060 0.9900
2.8965 0.9950 3.3554
27Solution
- The degrees of freedom is n1 n2 2 5 5
2 8, which is less than 30, so we use t. - For a 90 CI choose t8-value 1.8595.
- The standard error is
- Then the CI is
- and the confidence interval is (-14.5, -3.5).
- Thus we are 90 confident that for an office
worker leaving home early, the average travel
time to work is between 3.5 and 14.5 minutes
shorter than for an office worker leaving home
later.
284. Sample size determination formula
- For a confidence interval of accuracy e, and
given confidence the required sample size is
- where ?2 is the population variance and z is z
value for the chosen confidence level. - Some prior information about ?2 is essential.
29Textbook Example (9.6.1)
- A sample of only 7 observations produced a 90 CI
for average weekly soft drink machine profit, of
width about 25. - The standard deviation was 16.75.
- To obtain a more precise estimate of weekly
profit, ie a 99 CI with accuracy ?5 (width
10), how many additional observations will be
needed?
30Example (9.6.1), continued
31Lecture Exercise 3
- What sample size is needed to obtain a 90
confidence interval for an unknown population
mean, with accuracy 3 units, where the population
standard deviation is known from long past
experience to be around 5.5 units?
32Solution
335. CIs for slope in simple linear regression
34Minitab output for s.e. of the slope estimate
selling price of houses (y) and LGA (local
government area valuation, x) Units are 000
The standard error of the slope estimate is 0.107
and the sample size is 8.
35Lecture Example 2
Find a 95 confidence interval for the slope in
the LGA example.
- Answer The degrees of freedom is 6 which is
less than 30, so use t. The t-value from Minitab
is
Student's t distribution with 6 DF P( X lt x )
x 0.9750 2.4469
The CI is b1 ? t6 ? s.e.(b1) i.e. 1.285 ?
2.4469?0.107. So the CI is (1.023, 1.547).
We are 95 confident that the true slope is
between 1.023 and 1.547.
36Confidence intervals for predictions in simple
linear regression
- Use Minitab to carry out regression, using
- Stat gt Regression gt Regression, as usual.
- Under Options, enter the value of x under
Prediction interval for new observations. - Select the confidence level.
- Minitab calculates an estimated predicted
expected value, its standard error, and a
confidence interval.
37How to find a 95 CI for the predicted average
house selling price when LGA 140,000 in Example
(5.4.1)?
Lecture Example 3
Follow the steps on the previous slide. Minitab
output is as follows
- Predicted Values
- Fit StDev Fit 95.0 CI
- 185.51 5.83 (171.23, 199.78)
So the required CI is 171,000 to 200,000.
387. Conditions and Assumptions for means
- Well defined continuous variable
- Representative sample
- A large sample size or normally distributed
variable - Independence
- if d.f. gt 30, central limit effects make this
OK. Or if a sample follows normal distribution,
this condition is met no matter how small sample
size is.
39Some conditions are usually satisfied easily
- such as the well-defined variables, the
representative sample, and independence
conditions - Having degrees of freedom gt 30 helps central
limit effects to make the normality condition OK - Having equal sample sizes reduces any bad effects
of wrongly assuming equal variances, and also
helps central limit effects to make the normality
condition OK.
406. The four components for interval estimation
- Examples discussed have concerned means, slopes,
predictions and differences between means - In each case the four components parameter,
estimate, standard error and confidence interval
can be identified - Sometimes a decision is needed on whether to use
z or t see whether the degrees of freedom is
greater or less than 30
41Summary of four components
- For the population proportion always use z-value
- For population mean use t but if d.f. are gt30 you
can use z
42Summary Degrees of freedom formulas for t