5' Analysis of Variance IV Anlisis de Varianza IV - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

5' Analysis of Variance IV Anlisis de Varianza IV

Description:

... relative size of the variability explained by the groups / una medida relativa ... Let us suppose that we have rejected the null hypothesis. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 29
Provided by: SWil163
Category:

less

Transcript and Presenter's Notes

Title: 5' Analysis of Variance IV Anlisis de Varianza IV


1
5. Analysis of Variance IV Análisis de
Varianza IV
  • Profesor Simon Wilson
  • Departamento de Estadística y Econometría

2
Estimating m when H0 is accepted
  • If we accept H0, then all levels have a common
    mean m.
  • Its estimate is
  • Then we can estimate s2 by
  • This has n 1 degrees of freedom
  • Use this to construct confidence intervals for m
    and s2 in the usual manner.

3
The coefficient of determination
  • This is a measure of the relative size of the
    variability explained by the groups / una medida
    relativa de la variabilidad explicada por los
    grupos
  • It is a number between 0 and 1.

4
The efficiency / eficacia of the F test (1)
  • How good is this test?
  • To simplify, assume that each group has m
    observations (so n m I) . Then
  • where sm2 is the variance between the means
  • Clearly, as m increases, so then F will increase,
    so more possibility of rejecting H0 as we
    increase m.

5
The efficiency of the F test (2)
  • On the other hand, keep m fixed, we can increase
    F if the denominator decreases
  • This happens if the experimental error s is
    smaller (since the denominator estimates this).
  • So the power of the F test increases when we
  • increase the size of the data from each group
  • are able to reduce the experimental error.

6
Analysis of the Difference in Means
  • Let us suppose that we have rejected the null
    hypothesis. We believe that the means of the
    groups are different.
  • The 100(1 - a) confidence interval for the
    differences in 2 means is
  • Note t distribution with n-I degrees of freedom

7
Analysis of the Difference in Means multiple
tests (1)
  • Remember that we can do a t-test to see if any
    two of the means are different (recall start of
    the whisky example).
  • There are such pairs of means
  • If H0 is true, and level of significance is 5,
    we accept H0 for each test with probability 0.95
  • The probability that we accept H0 for all these
    tests is

8
Analysis of the Difference in Means multiple
tests (2)
  • This assumes that all the tests are independent
  • So even when H0 is true, when we do multiple
    tests on pairs of means, we accept H0 as true
    only with probability 0.45 and not 0.95.
  • As I ? ?, so this probability ? 0

9
The Bonferroni Method (1)
  • The Bonferroni method is a way to solve this
    problem of multiple tests
  • It calculates a level of significance a for each
    test on pairs of means, such that the probability
    of H0 accepted for all tests is 1-aT (the level
    of signifcance that we want).
  • If c is the total number of tests, then we want
  • aT P(reject H0 at least once) P(reject H0 in
    test 1 OR reject H0 in test 2 OR ... OR reject H0
    in test c)
  • lt P(reject H0 in test 1) ... P(reject H0 in
    test c)
  • ca

10
The Bonferroni Method (2)
  • So a aT / c works.
  • Of course, when c is large, this means a is very
    small, and that value of ta/2 is not in the
    tables.
  • We use the approximation in this case of
  • (za from normal tables)

11
The Bonferroni Method (3)
  • So, when the F test rejects H0 at the level of
    significance aT, we can further investigate each
    pair of means using the t-test at level of
    signifinance a.

12
Confidence Interval for the Variance
  • Because
  • (Remember from Section 3)
  • So a 100(1-a) confidence interval for s2 is

13
Example Whisky Data (see next slide)
  • Construct 95 confidence intervals for
  • The difference between the means of each whisky
    (remember that our estimate for s2 is 9.356)
  • For the variance s2
  • The Bonferroni method for the Whisky data
  • How many t-tests are there?
  • What is a if aT 0.05?
  • Do the t-test with a for the JB and Glenfiddich
    data. (you may need that z0.992 2.41)

14
The Whisky Data
15
Model Diagnostics / Diagnosis del Modelo (1)
  • What have we done up to now?
  • Estimate the model parameters m1,...,mI and s2
    (point and interval estimates)
  • Test to see if the means are different, using the
    F-test
  • If H0 is rejected (i.e. means are different)
  • Test the difference of each pair of means using
    the Bonferroni method
  • Estimate the difference in means (and give a
    confidence interval for the difference)

16
Model Diagnostics (2)
  • We now must study if the basic assumptions / las
    hipótesis básicas of the model are reasonable or
    not.
  • Recall that the model for the data is
  • yij mi uij
  • where uij are independent and normally
    distributed with mean 0 and variance s2.

17
Model Diagnostics (3)
  • The peturbations uij are estimated by the
    residuals
  • So, if our model agrees with the data, the
    residuals should be independent and normally
    distributed with mean 0 and variance s2.

18
Model Diagnostics (4)
  • There is a problem with the residuals the sum of
    each group is always 0
  • So they are not independent!!
  • However, if n is big with respect to I, we can
    consider them to be almost independent

19
Model Diagnostics (5)
  • There are lots of ways to test the residuals
  • Are they normally distributed? Draw a histogram
    of the residuals. Does it look like a normal
    distribution? Do a goodness of fit test for
    the normal distribution.
  • Here are some possible problems that the
    histogram will identify
  • Residuals are in 2 or more groups.
  • Outlier / valor atípico is present
  • Residuals are not symmetric

20
Model Diagnostics (6)
  • Are there outliers / valores atípicos? These are
    values that are much larger or smaller than the
    others. If they exist, you must investigate the
    cause for such a value. If you can find no
    reason for the outlier, then the model may not be
    correct.

21
Model Diagnostics (7)
  • The variance must be the same in all the groups.
    Draw the residuals as a function of their group
    mean there should be no relationship between the
    group mean and the variability in the residuals.
  • If the data is collected sequentially /
    secuencialmente, draw the residuals as a function
    of time. If the observations are independent
    then there should be no trend in the plot.

22
Residuals for the Whisky Data
  • The residuals for the whisky data are on the next
    slide, then a histogram, then a plot of residuals
    by level.
  • From the histogram
  • Does it look like a normal distribution?
  • Are there outliers?
  • Is there a relationship between group and
    variance?
  • Suppose we know that the data was collected
    sequentially. Plot the residuals sequentially.

23
Residuals from the Whisky Example
24
Histogram of Residuals
25
Residuals for each Level
26
What if the Residuals are not Normally
Distributed?
  • If the residuals are not normally distributed,
    then
  • We cannot trust our estimate of s2, and therefore
    also cannot trust the confidence intervals for
    the mi, and the differences in means.
  • However, the F test is still usually valid. This
    is because the F test only relies on the Central
    Limit Theorem. If the data are not normally
    distributed, then the F test is still
    approximately correct (and the approximation is
    better if n is larger)

27
What if the Variances are not Equal for each
Group?
  • Clearly, confidence intervals that use the
    estimate for s2 will be wrong
  • If all groups have approximately the same number
    of observations i.e. ni ? n / I, i1,...,I, then
    the F test still works
  • However, if the sizes of the groups are very
    different (say max ni / min ni gt 2), it is not
    valid
  • A formal test of equality of variances is usually
    not worth it ( this test needs normally
    distributed data)

28
What if the Observations are not Independent?
  • This is usually a serious problem. The formulae
    for the confidence intervals for means are not
    valid
  • The F test is not valid either (the Central Limit
    Theorem needs independence)
  • Remember that randomization / aleatorización is
    the best way of avoiding this problem
Write a Comment
User Comments (0)
About PowerShow.com