Four major statistical categories

About This Presentation

Title:

Four major statistical categories

Description:

... of designer jeans is using advertising to develop an expensive and classy image. ... selling price of houses (y) and LGA (local. government area valuation, ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 43

Provided by: unisanet

Category:

more less

Transcript and Presenter's Notes

Title: Four major statistical categories

1
Four major statistical categories
Weeks 8 and 9
2
Week 9 objectives

Confidence intervals for population means
The t-distribution, and CIs based on t
CIs for differences in population means
Sample size determination for means
CIs for slope in simple linear regression
The four components of interval estimation for
means and slopes
Assumptions and Conditions

3
Estimation using results from a sample to
describe the population.

A manufacturer of designer jeans is using
advertising to develop an expensive and classy
image. The suggested retail price is 85.
The manufacturer is concerned that retailers are
undermining her image by offering the jeans at
discount prices. She needs to understand better
what is happening.
What is the average selling price of the jeans at
all retail stores?
She randomly samples retailers who sell her
product and determines their price.
A major car manufacturer with plants in a number
of countries wants to examine productivity in
Australia and Korea.
What is the difference in the average number of
cars produced between the plants in Australia and
Korea?
The daily output is recorded for one randomly
selected plant in Australia and a similar-sized
plant in Korea.

4
Lecture Example 1
Find a 95 CI for the long-run mean petrol
consumption of your car under city driving
conditions, when the average consumption is 8.62
km/litre calculated from a random sample of 31
tanks of petrol that has an estimated standard
deviation 2.39 km/litre.
5
Review
The general confidence interval formula
For the general parameter is, where the
z-value chosen to give desired confidence level
6
1. Application to CIs for a population mean
The parameter of interest is the population mean
. The estimate of the parameter is . The
standard error of (or ) is where s is
the sample standard deviation.
7
Lecture Example 1, continued
8
2. A better CI exact, even for small samples

The previous CI for a population mean is
approximate, and should only be used for large
sample sizes n
There is an alternative form which is accurate
even for small sample sizes
It uses a new distribution, the t-distribution
But it relies on the assumption that the data are
drawn from a normal distribution

9
Standardisation and Studentisation
The t-distribution was 'discovered' by W.S.
Gossett. He wrote under the pen-name of
'Student'. So the t-distribution is often called
'Student's t-distribution'.
is Standardisation is Studentisation
10
What is the t-distribution?

The t-distribution looks like the standard normal
but it has thicker tails
This represents more variability for variables
from a t-distribution
In turn, this has been produced in
standardisation by dividing by a sample standard
deviation, which is a variable, not a constant
See the textbook for more details

11
What are degrees of freedom?

Every t-distribution has a parameter k, the
degrees of freedom
This measures the amount of information about
variance (or standard deviation)
When k is large, the t-distribution is very like
the standard normal
Can you see why this makes sense?

12
What is the t-distribution?
pdfs of t-distributions the t pdf is close to
the standard normal z even for 10 degrees of
freedom

pdfs of Z, t3 and t10

13
CIs using the t-distribution
Whenever a population standard deviation ? is
replaced by an estimated standard deviation s,
replace z-values by t-values.
If the sample size n is large, then the
value is close to z the estimation of by s
is accurate, so the top formula is OK.
14
What is the value in this CI
expression?
(It helps to draw a diagram)
Because the t-distribution has more variability
than the standard normal z, these critical values
will exceed 1.96. But if n is large, they will
be close to 1.96
15
Calc gt Probability distributions gt t
distribution, then select degrees of freedom,
probability level and the Inverse Cumulative
Probability option
Critical values of t from Minitab
degrees of 97.5 point freedom k of
t 5 2.5706 10 2.2281 30 2.0423 50 2.00
86 100 1.9840 500 1.9647
As k increases these critical values approach
1.96, the corresponding z-value.
16
When to use z, not t?

A convenient rule of thumb is that when the
degrees of freedom k is gt 30, the t-distribution
is close enough to the standard normal z to make
unnecessary the refinement of using t.

17
Vending machine profits
Example (9.3.1) of the textbook
A take-away store installs a vending machine.
During the first seven weeks of operation, its
profit figures () are 390, 377, 402, 363,
415, 393 and 387. Estimate and give a 90
confidence interval for the long run mean weekly
profit. Variable N Mean StDev SE
Mean profits 7 389.57 16.75 6.33
18
(from the formula for CIs for a population mean)
Example (9.3.1), continued
19
(Stat gt Basic Statistics gt 1 Sample t, then
select the confidence interval option)
or by Minitab
T Confidence Intervals Variable N Mean StDev
SE Mean 90.0 CI profits 7 389.57 16.75
6.33 (377.27, 401.87)
20
Lecture Exercise 1

Find a 95 CI for the long-run mean petrol
consumption of Example 1 using t distribution.
Choose the appropriate t-value from the Minitab
output below.
How does this CI compare to the one obtained
using z-value?

Student's t distribution with 30 DF P( X lt x )
x 0.9000 1.3104 0.9500
1.6973 0.9750 2.0423 0.9900
2.4573 0.9950 2.7500
21
Solution
We are 95 confident that the average fuel
consumption is between 7.74 and 9.50
km/litre. (Previous answer was 7.78 to 9.46)
22
3. Confidence interval for a difference between
two population means
23
How to find a confidence interval for the
difference between two population means?
24
Part of Minitab output CIs for differences
between means, see Textbook Example (9.4.1)
Which assumes equal variances? Answer the first
25
How to decide between equal and unequal variances?
26
Lecture Exercise 2

For the Textbook Example (9.4.1), verify the 90
CI for the difference between means assuming
equal variances given by Minitab.
The formula for the standard error in this case
is
where s is the pooled StDev from Minitab.

Student's t distribution with 8 DF P( X lt x )
x 0.9000 1.3968 0.9500
1.8595 0.9750 2.3060 0.9900
2.8965 0.9950 3.3554
27
Solution

The degrees of freedom is n1 n2 2 5 5
2 8, which is less than 30, so we use t.
For a 90 CI choose t8-value 1.8595.
The standard error is
Then the CI is

and the confidence interval is (-14.5, -3.5).
Thus we are 90 confident that for an office
worker leaving home early, the average travel
time to work is between 3.5 and 14.5 minutes
shorter than for an office worker leaving home
later.

28
4. Sample size determination formula

For a confidence interval of accuracy e, and
given confidence the required sample size is

where ?2 is the population variance and z is z
value for the chosen confidence level.
Some prior information about ?2 is essential.

29
Textbook Example (9.6.1)

A sample of only 7 observations produced a 90 CI
for average weekly soft drink machine profit, of
width about 25.
The standard deviation was 16.75.
To obtain a more precise estimate of weekly
profit, ie a 99 CI with accuracy ?5 (width
10), how many additional observations will be
needed?

30
Example (9.6.1), continued
31
Lecture Exercise 3

What sample size is needed to obtain a 90
confidence interval for an unknown population
mean, with accuracy 3 units, where the population
standard deviation is known from long past
experience to be around 5.5 units?

32
Solution
33
5. CIs for slope in simple linear regression
34
Minitab output for s.e. of the slope estimate
selling price of houses (y) and LGA (local
government area valuation, x) Units are 000
The standard error of the slope estimate is 0.107
and the sample size is 8.
35
Lecture Example 2
Find a 95 confidence interval for the slope in
the LGA example.

Answer The degrees of freedom is 6 which is
less than 30, so use t. The t-value from Minitab
is

Student's t distribution with 6 DF P( X lt x )
x 0.9750 2.4469
The CI is b1 ? t6 ? s.e.(b1) i.e. 1.285 ?
2.4469?0.107. So the CI is (1.023, 1.547).
We are 95 confident that the true slope is
between 1.023 and 1.547.
36
Confidence intervals for predictions in simple
linear regression

Use Minitab to carry out regression, using
Stat gt Regression gt Regression, as usual.
Under Options, enter the value of x under
Prediction interval for new observations.
Select the confidence level.
Minitab calculates an estimated predicted
expected value, its standard error, and a
confidence interval.

37
How to find a 95 CI for the predicted average
house selling price when LGA 140,000 in Example
(5.4.1)?
Lecture Example 3
Follow the steps on the previous slide. Minitab
output is as follows

Predicted Values
Fit StDev Fit 95.0 CI
185.51 5.83 (171.23, 199.78)

So the required CI is 171,000 to 200,000.
38
7. Conditions and Assumptions for means

Well defined continuous variable
Representative sample
A large sample size or normally distributed
variable
Independence
if d.f. gt 30, central limit effects make this
OK. Or if a sample follows normal distribution,
this condition is met no matter how small sample
size is.

39
Some conditions are usually satisfied easily

such as the well-defined variables, the
representative sample, and independence
conditions
Having degrees of freedom gt 30 helps central
limit effects to make the normality condition OK
Having equal sample sizes reduces any bad effects
of wrongly assuming equal variances, and also
helps central limit effects to make the normality
condition OK.

40
6. The four components for interval estimation

Examples discussed have concerned means, slopes,
predictions and differences between means
In each case the four components parameter,
estimate, standard error and confidence interval
can be identified
Sometimes a decision is needed on whether to use
z or t see whether the degrees of freedom is
greater or less than 30

41
Summary of four components

For the population proportion always use z-value
For population mean use t but if d.f. are gt30 you
can use z

42
Summary Degrees of freedom formulas for t

Write a Comment

User Comments (0)

About PowerShow.com

Four major statistical categories - PowerPoint PPT Presentation

Four major statistical categories

... of designer jeans is using advertising to develop an expensive and classy image. ... selling price of houses (y) and LGA (local. government area valuation, ... – PowerPoint PPT presentation