Analysis of Variance - PowerPoint PPT Presentation

About This Presentation

Title:

Analysis of Variance

Description:

Chapter 15 Analysis of Variance – PowerPoint PPT presentation

Number of Views:351

Avg rating:3.0/5.0

Slides: 46

Provided by: sbae9

Learn more at: https://personal.morris.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Analysis of Variance

1
Analysis of Variance

Chapter 15

2
15.1 Introduction

Analysis of variance compares two or more
populations of interval data.
Specifically, we are interested in determining
whether differences exist between the population
means.
The procedure works by analyzing the sample
variance.

3
15.2 One Way Analysis of Variance

The analysis of variance is a procedure that
tests to determine whether differences exits
between two or more population means.
To do this, the technique analyzes the sample
variances

4
One Way Analysis of Variance

Example 15.1
An apple juice manufacturer is planning to
develop a new product -a liquid concentrate.
The marketing manager has to decide how to market
the new product.
Three strategies are considered
Emphasize convenience of using the product.
Emphasize the quality of the product.
Emphasize the products low price.

5
One Way Analysis of Variance

Example 15.1 - continued
An experiment was conducted as follows
In three cities an advertisement campaign was
launched .
In each city only one of the three
characteristics (convenience, quality, and price)
was emphasized.
The weekly sales were recorded for twenty weeks
following the beginning of the campaigns.

6
One Way Analysis of Variance

See file
Xm15 -01

Weekly sales
Weekly sales
Weekly sales
7
One Way Analysis of Variance

Solution
The data are interval
The problem objective is to compare sales in
three cities.
We hypothesize that the three population means
are equal

8
Defining the Hypotheses

Solution

H0 m1 m2 m3 H1 At least two means
differ To build the statistic needed to test
thehypotheses use the following notation
9
Notation
Independent samples are drawn from k populations
(treatments).
X11 x21 . . . Xn1,1
X12 x22 . . . Xn2,2
X1k x2k . . . Xnk,k
Sample size
Sample mean
X is the response variable. The variables
value are called responses.
10
Terminology

In the context of this problem
Response variable weekly salesResponses
actual sale valuesExperimental unit weeks in
the three cities when we record sales
figures.Factor the criterion by which we
classify the populations (the treatments). In
this problems the factor is the marketing
strategy.
Factor levels the population (treatment)
names. In this problem factor levels are the
marketing trategies.

11
Two types of variability are employed when
testing for the equality of the population means
The rationale of the test statistic
12
Graphical demonstration Employing two types of
variability
13
20
16 15 14
11 10 9
The sample means are the same as before, but the
larger within-sample variability makes it harder
to draw a conclusion about the population means.
A small variability within the samples makes it
easier to draw a conclusion about the population
means.
Treatment 1
Treatment 2
Treatment 3
14
The rationale behind the test statistic I

If the null hypothesis is true, we would expect
all the sample means to be close to one another
(and as a result, close to the grand mean).
If the alternative hypothesis is true, at least
some of the sample means would differ.
Thus, we measure variability between sample
means.

15
Variability between sample means

The variability between the sample means is
measured as the sum of squared distances between
each mean and the grand mean.
This sum is called the
Sum of Squares for Treatments
SST

In our example treatments are represented by the
different advertising strategies.
16
Sum of squares for treatments (SST)
There are k treatments
The mean of sample j
The size of sample j
Note When the sample means are close toone
another, their distance from the grand mean is
small, leading to a small SST. Thus, large SST
indicates large variation between sample means,
which supports H1.
17
Sum of squares for treatments (SST)

Solution continuedCalculate SST

20(577.55 - 613.07)2 20(653.00 -
613.07)2 20(608.65 - 613.07)2 57,512.23
The grand mean is calculated by
18
Sum of squares for treatments (SST)

Is SST 57,512.23 large enough to reject H0 in
favor of H1?See next.

19
The rationale behind test statistic II

Large variability within the samples weakens the
ability of the sample means to represent their
corresponding population means.
Therefore, even though sample means may markedly
differ from one another, SST must be judged
relative to the within samples variability.

20
Within samples variability

The variability within samples is measured by
adding all the squared distances between
observations and their sample means.
This sum is called the
Sum of Squares for Error
SSE

In our example this is the sum of all squared
differences between sales in city j and
the sample mean of city j (over all the three
cities).
21
Sum of squares for errors (SSE)

Solution continuedCalculate SSE

(n1 - 1)s12
(n2 -1)s22 (n3 -1)s32 (20 -1)10,774.44 (20
-1)7,238.61 (20-1)8,670.24 506,983.50
22
Sum of squares for errors (SSE)

Is SST 57,512.23 large enough relative to SSE
506,983.50 to reject the null hypothesis that
specifies that all the means are equal?

23
The mean sum of squares
To perform the test we need to calculate the mean
squares as follows
24
Calculation of the test statistic
Required Conditions 1. The populations tested
are normally distributed. 2. The variances
of all the populations tested are equal.
with the following degrees of freedom v1k -1
and v2n-k
25
The F test rejection region
the hypothesis test
And finally
26
The F test
Ho m1 m2 m3 H1 At least two means differ
Test statistic F MST/ MSE
3.23
Since 3.23 gt 3.15, there is sufficient evidence
to reject Ho in favor of H1, and argue that at
least one of the mean sales is different than
the others.
27
The F test p- value

Use Excel to find the p-value
fx Statistical
FDIST(3.23,2,57) .0467

p Value P(Fgt3.23) .0467
28
Excel single factor ANOVA
Xm15-01.xls
SS(Total) SST SSE
29
15.3 Analysis of Variance Experimental Designs

Several elements may distinguish between one
experimental design and others.
The number of factors.
Each characteristic investigated is called a
factor.
Each factor has several levels.

30
One - way ANOVA Single factor
Two - way ANOVA Two factors
Response
Response
Treatment 3 (level 1)
Treatment 2 (level 2)
Treatment 1 (level 3)
Level 3
Level2
Factor A
Level 1
Level 1
Level2
Factor B
31
Independent samples or blocks

Groups of matched observations are formed into
blocks, in order to remove the effects of
unwanted variability.
By doing so we improve the chances of detecting
the variability of interest.

32
Models of Fixed and Random Effects

Fixed effects
If all possible levels of a factor are included
in our analysis we have a fixed effect ANOVA.
The conclusion of a fixed effect ANOVA applies
only to the levels studied.
Random effects
If the levels included in our analysis represent
a random sample of all the possible levels, we
have a random-effect ANOVA.
The conclusion of the random-effect ANOVA applies
to all the levels (not only those studied).

33
Models of Fixed and Random Effects.

In some ANOVA models the test statistic of the
fixed effects case may differ from the test
statistic of the random effect case.
Fixed and random effects - examples
Fixed effects - The advertisement Example
(15.1) All the levels of the marketing
strategies were included
Random effects - To determine if there is a
difference in the production rate of 50 machines,
four machines are randomly selected and there
production recorded.

34
15.4 Randomized Blocks (Two-way) Analysis of
Variance

The purpose of designing a randomized block
experiment is to reduce the within-treatments
variation thus increasing the relative amount of
between treatment variation.
This helps in detecting differences between the
treatment means more easily.

35
Randomized Blocks
Block all the observations with some commonality
across treatments
Treatment 4
Treatment 3
Treatment 2
Treatment 1
Block 1
Block3
Block2
36
Randomized Blocks
Block all the observations with some commonality
across treatments
37
Partitioning the total variability

The sum of square total is partitioned into three
sources of variation
Treatments
Blocks
Within samples (Error)

Recall. For the independent
samples design we have SS(Total) SST SSE
SS(Total) SST SSB SSE
38
Calculating the sums of squares

Formulai for the calculation of the sums of
squares

39
Calculating the sums of squares

Formulai for the calculation of the sums of
squares

40
Mean Squares

To perform hypothesis tests for treatments and
blocks we need
Mean square for treatments
Mean square for blocks
Mean square for error

41
Test statistics for the randomized block design
ANOVA
42
The F test rejection regions

Testing the mean responses for treatments
F gt Fa,k-1,n-k-b1
Testing the mean response for blocks
Fgt Fa,b-1,n-k-b1

43
Randomized Blocks ANOVA - Example

Example 15.2
Are there differences in the effectiveness of
cholesterol reduction drugs?
To answer this question the following experiment
was organized
25 groups of men with high cholesterol were
matched by age and weight. Each group consisted
of 4 men.
Each person in a group received a different drug.
The cholesterol level reduction in two months was
recorded.
Can we infer from the data in Xm15-02 that there
are differences in mean cholesterol reduction
among the four drugs?

44
Randomized Blocks ANOVA - Example

Solution
Each drug can be considered a treatment.
Each 4 records (per group) can be blocked,
because they are matched by age and weight.
This procedure eliminates the variability in
cholesterol reduction related to different
combinations of age and weight.
This helps detect differences in the mean
cholesterol reduction attributed to the different
drugs.