5' Analysis of Variance IV Anlisis de Varianza IV - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

5' Analysis of Variance IV Anlisis de Varianza IV

Description:

... relative size of the variability explained by the groups / una medida relativa ... Let us suppose that we have rejected the null hypothesis. ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 29

Provided by: SWil163

Category:

more less

Transcript and Presenter's Notes

Title: 5' Analysis of Variance IV Anlisis de Varianza IV

1
5. Analysis of Variance IV Análisis de
Varianza IV

Profesor Simon Wilson
Departamento de Estadística y Econometría

2
Estimating m when H0 is accepted

If we accept H0, then all levels have a common
mean m.
Its estimate is
Then we can estimate s2 by
This has n 1 degrees of freedom
Use this to construct confidence intervals for m
and s2 in the usual manner.

3
The coefficient of determination

This is a measure of the relative size of the
variability explained by the groups / una medida
relativa de la variabilidad explicada por los
grupos
It is a number between 0 and 1.

4
The efficiency / eficacia of the F test (1)

How good is this test?
To simplify, assume that each group has m
observations (so n m I) . Then
where sm2 is the variance between the means
Clearly, as m increases, so then F will increase,
so more possibility of rejecting H0 as we
increase m.

5
The efficiency of the F test (2)

On the other hand, keep m fixed, we can increase
F if the denominator decreases
This happens if the experimental error s is
smaller (since the denominator estimates this).
So the power of the F test increases when we
increase the size of the data from each group
are able to reduce the experimental error.

6
Analysis of the Difference in Means

Let us suppose that we have rejected the null
hypothesis. We believe that the means of the
groups are different.
The 100(1 - a) confidence interval for the
differences in 2 means is
Note t distribution with n-I degrees of freedom

7
Analysis of the Difference in Means multiple
tests (1)

Remember that we can do a t-test to see if any
two of the means are different (recall start of
the whisky example).
There are such pairs of means
If H0 is true, and level of significance is 5,
we accept H0 for each test with probability 0.95
The probability that we accept H0 for all these
tests is

8
Analysis of the Difference in Means multiple
tests (2)

This assumes that all the tests are independent
So even when H0 is true, when we do multiple
tests on pairs of means, we accept H0 as true
only with probability 0.45 and not 0.95.
As I ? ?, so this probability ? 0

9
The Bonferroni Method (1)

The Bonferroni method is a way to solve this
problem of multiple tests
It calculates a level of significance a for each
test on pairs of means, such that the probability
of H0 accepted for all tests is 1-aT (the level
of signifcance that we want).
If c is the total number of tests, then we want
aT P(reject H0 at least once) P(reject H0 in
test 1 OR reject H0 in test 2 OR ... OR reject H0
in test c)
lt P(reject H0 in test 1) ... P(reject H0 in
test c)
ca

10
The Bonferroni Method (2)

So a aT / c works.
Of course, when c is large, this means a is very
small, and that value of ta/2 is not in the
tables.
We use the approximation in this case of
(za from normal tables)

11
The Bonferroni Method (3)

So, when the F test rejects H0 at the level of
significance aT, we can further investigate each
pair of means using the t-test at level of
signifinance a.

12
Confidence Interval for the Variance

Because
(Remember from Section 3)
So a 100(1-a) confidence interval for s2 is

13
Example Whisky Data (see next slide)

Construct 95 confidence intervals for
The difference between the means of each whisky
(remember that our estimate for s2 is 9.356)
For the variance s2
The Bonferroni method for the Whisky data
How many t-tests are there?
What is a if aT 0.05?
Do the t-test with a for the JB and Glenfiddich
data. (you may need that z0.992 2.41)

14
The Whisky Data
15
Model Diagnostics / Diagnosis del Modelo (1)

What have we done up to now?
Estimate the model parameters m1,...,mI and s2
(point and interval estimates)
Test to see if the means are different, using the
F-test
If H0 is rejected (i.e. means are different)
Test the difference of each pair of means using
the Bonferroni method
Estimate the difference in means (and give a
confidence interval for the difference)

16
Model Diagnostics (2)

We now must study if the basic assumptions / las
hipótesis básicas of the model are reasonable or
not.
Recall that the model for the data is
yij mi uij
where uij are independent and normally
distributed with mean 0 and variance s2.

17
Model Diagnostics (3)

The peturbations uij are estimated by the
residuals
So, if our model agrees with the data, the
residuals should be independent and normally
distributed with mean 0 and variance s2.

18
Model Diagnostics (4)

There is a problem with the residuals the sum of
each group is always 0
So they are not independent!!
However, if n is big with respect to I, we can
consider them to be almost independent

19
Model Diagnostics (5)

There are lots of ways to test the residuals
Are they normally distributed? Draw a histogram
of the residuals. Does it look like a normal
distribution? Do a goodness of fit test for
the normal distribution.
Here are some possible problems that the
histogram will identify
Residuals are in 2 or more groups.
Outlier / valor atípico is present
Residuals are not symmetric

20
Model Diagnostics (6)

Are there outliers / valores atípicos? These are
values that are much larger or smaller than the
others. If they exist, you must investigate the
cause for such a value. If you can find no
reason for the outlier, then the model may not be
correct.

21
Model Diagnostics (7)

The variance must be the same in all the groups.
Draw the residuals as a function of their group
mean there should be no relationship between the
group mean and the variability in the residuals.
If the data is collected sequentially /
secuencialmente, draw the residuals as a function
of time. If the observations are independent
then there should be no trend in the plot.

22
Residuals for the Whisky Data

The residuals for the whisky data are on the next
slide, then a histogram, then a plot of residuals
by level.
From the histogram
Does it look like a normal distribution?
Are there outliers?
Is there a relationship between group and
variance?
Suppose we know that the data was collected
sequentially. Plot the residuals sequentially.

23
Residuals from the Whisky Example
24
Histogram of Residuals
25
Residuals for each Level
26
What if the Residuals are not Normally
Distributed?

If the residuals are not normally distributed,
then
We cannot trust our estimate of s2, and therefore
also cannot trust the confidence intervals for
the mi, and the differences in means.
However, the F test is still usually valid. This
is because the F test only relies on the Central
Limit Theorem. If the data are not normally
distributed, then the F test is still
approximately correct (and the approximation is
better if n is larger)

27
What if the Variances are not Equal for each
Group?

Clearly, confidence intervals that use the
estimate for s2 will be wrong
If all groups have approximately the same number
of observations i.e. ni ? n / I, i1,...,I, then
the F test still works
However, if the sizes of the groups are very
different (say max ni / min ni gt 2), it is not
valid
A formal test of equality of variances is usually
not worth it ( this test needs normally
distributed data)

28
What if the Observations are not Independent?

This is usually a serious problem. The formulae
for the confidence intervals for means are not
valid
The F test is not valid either (the Central Limit
Theorem needs independence)
Remember that randomization / aleatorización is
the best way of avoiding this problem

Write a Comment

User Comments (0)