Basics of ANOVA - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Basics of ANOVA

Description:

Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table R commands for ANOVA – PowerPoint PPT presentation

Number of Views:221
Avg rating:3.0/5.0
Slides: 13
Provided by: gar115
Category:
Tags: anova | basics | test

less

Transcript and Presenter's Notes

Title: Basics of ANOVA


1
Basics of ANOVA
  • Why ANOVA
  • Assumptions used in ANOVA
  • Various forms of ANOVA
  • Simple ANOVA tables
  • Interpretation of values in the table
  • R commands for ANOVA
  • Exercises

2
Why ANOVA
  • If we have two samples then under mild conditions
    we can use t-test to test if difference between
    means is significant. When there are more than
    two sample then using t-test might become
    unreliable.
  • ANalysis Of VAriances ANOVA is designed to test
    differences between means in many sample cases.
  • Examples of ANOVA Suppose that we want to test
    effect of various exercises on weight loss. We
    want to test 5 different exercises. We recruit 20
    men and assign for each exercises four of them.
    After few weeks we record weight loss. Let us
    denote i1,2,3,4,5 as exercise number and
    j1,2,3,4 persons number. Then Yij is weight
    loss for jth person on the ith exercise
    programme. It is one-way balanced ANOVA. One way
    because we have only one category (exercise
    programme). Balanced because we have exactly same
    number of men on each exercise programme.
  • Another example Now we want to subdivide each
    exercises into 4 subcategories. For each
    subcategory of the exercise we recruit four men.
    We measure weight loss after few weeks. i
    exercise category
  • j exercise subcategory
  • k kth men.
  • Then Yijk is weight loss for kth men in the jth
    subcategory of ith category. Number of
    observations is 5x4x4 80. It is two-fold nested
    ANOVA.
  • We want to test a) There is no significant
    differences between categories b) there is no
    significant difference between different
    subcategories

3
Examples of ANOVA
  • One more example We have 5 categories of
    exercises and 4 categories of diets. We hire for
    each exercise and category 4 persons. There will
    be 5x4x480 men. It is two way crossed ANOVA.
    Two-way because we have categorised men in two
    ways exercises and diets. This model is also
    balanced we have exactly same number of men for
    each exercise-diet.
  • i exercise number
  • j diet number
  • k kth person
  • Yijk kth person in the ith exercise
    and jth diet.
  • In this case we can have two different types of
    hypothesis testing. Assume that mean for each
    exercise-diet combination is ?ij. If we assume
    that model is additive, i.e. effects of exercise
    and diet add up then we have ?ij ?i?j. ?i is
    the effect of ith exercise and ?j is the effect
    of diet. Then we want to test following
    hypotheses a) ?ij does not depend on exercise
    and b) ?ij does not depend on diet.
  • Sometimes we do not want to assume additivity.
    Then we want to test one more hypothesis model
    is additive. If model is not additive then there
    might be some problems of interpretations with
    other hypotheses. In this case it might be useful
    to use transformation to make the model additive.
  • Models used for ANOVA can be made more and more
    complicated. We can design three, four ways
    crossed models or nested models. We can combine
    nested and crossed models together. Number of
    possible ANOVA models is very large.

4
Assumptions
  • ANOVA models are special cases of the linear
    models. We can write the model as
  • Where Y is the observation vector, ? -is vector
    of the means composed of the treatment means and
    ? is the error vector. Basic assumptions in ANOVA
    models are
  • Expected values of the errors are 0
  • Variance of all errors are equal to each other
  • Errors are independent
  • Errors are normally distributed
  • All ANOVA treatments are very sensitive to
    assumptions 1)-3). F-tests meant to be robust
    against the assumption 4). If assumptions 1)-3)
    are valid then 4) will always be valid at least
    asymptotically. I.e. for large number of the
    observations

5
ANOVA tables
  • Standard ANOVA tables look like
  • Where v1,,,vp are values we want to test if they
    are 0. df is degrees of freedom corresponding to
    this value. SSh is sum of the squares
    corresponding to this value (h denotes
    hypothesis). F is F-value we want to test. Its
    degrees of freedom is (di,de). Prob is
    corresponding probability. If probability is very
    low then we reject hypothesis that this value is
    0. If the value for prob is high enough then we
    do not reject null-hypothesis.
  • These values are calculated using likelihood
    ratio test. Let us say we want to test
    hypothesis
  • H0 vi0 vs H1vi?0
  • Then we maximise likelihood under null hypothesis
    find corresponding variance then we maximise the
    likelihood under alternative hypothesis and find
    corresponding variance. Then we calculate sum of
    the squares for null and alternative hypotheses
    and find F-statistics

effect df SSh MS F prob
v1 d1 SS1 MS1SS1/d1 MS1/MSe pr1
... ... ... ...
vp dp SSp MSpSSp/dp MSp/MSe prp
error de SSe MSeSSe/de
total N SSt
6
LR test for ANOVA
  • Suppose variances are
  • Then mean sum of the squares for the null and
    alternative hypotheses as
  • Since first sum of squares is ?2 with degrees of
    freedom dfh and the second sum of squares is ?2
    with degrees of freedom dfe and they are
    independent then their ratio has F-distribution
    with degrees of freedom (dfh,dfe). Degrees of
    freedom of hypothesis is found using number of
    elements in the category-1 in the simplest case.
  • Using this type of ANOVA tables we can only tell
    if there is significant differences between
    means. It does not tell which one is
    significantly different.
  • This ratio has F distribution if null-hypothesis
    is true. Otherwise it has non-central
    F-distribution.
  • Degree of freedom of hypothesis is defined by
    number of constraints it implies. Degree of
    freedom of error is as usual number of
    observations minus number of parameters

7
Example Two way ANOVA
  • Let us consider an example taken from Box, Hunter
    and Hunter. Experiment was done on animals.
    Survival times of the animals for various poisons
    and treatment was tested. Table is
  • treatment
  • A B C D
  • poisons
  • I 0.31 0.82 0.43
    0.45
  • 0.45 1.10 0.45
    0.71
  • 0.46 0.88 0.63
    0.66
  • 0.43 0.72 0.76
    0.62
  • II 0.36 0.92 0.44
    0.56
  • 0.29 0.61 0.35
    1.02
  • 0.40 0.49 0.31
    0.71
  • 0.23 1.24 0.40
    0.38
  • III 0.22 0.30 0.23
    0.30
  • 0.21 0.37 0.25
    0.36
  • 0.18 0.38 0.24
    0.31
  • 0.23 0.29 0.22
    0.33

8
ANOVA table
  • ANOVA table produced by R
  • Df Sum Sq Mean Sq
    F value Pr(gtF)
  • pois 2 1.03828 0.51914
    22.5135 4.551e-07
  • treat 3 0.92569
    0.30856 13.3814 5.057e-06
  • poistreat 6 0.25580 0.04263
    1.8489 0.1170
  • Residuals 36 0.83013 0.02306
  • Most important values are F and Pr(gtF).
  • In this table we have tests for pois. and treat.
    Moreover we have interaction between these two
    categories. Interaction means that it would be
    difficult to separate effects of these two
    categories. They should be considered
    simultaneously. Pr. for interaction is not very
    small and it is not large enough to discard
    interaction effects. In these situations
    transformation of the variables might help. Let
    us consider ANOVA table for the transformed
    observations. Let us use transformation 1/y. Now
    ANOVA table looks like
  • Df Sum Sq Mean Sq
    F value Pr(gtF)
  • pois 2 34.903 17.452
    72.2347 2.501e-13
  • treat 3 20.449 6.816
    28.2131 1.457e-09
  • poistreat 6 1.579 0.263
    1.0892 0.3874
  • Residuals 36 8.697 0.242

9
ANOVA table
  • According to this table Pr. corresponding to the
    interaction term is high. It means that
    interaction for the transformed variables is not
    significant. We could reject interaction terms.
    We can build the ANOVA table without the
    interactions. It will look like
  • Df Sum Sq Mean Sq
    F value Pr(gtF)
  • pois 2 34.903 17.452
    71.326 3.124e-14
  • treat 3 20.449 6.816
    27.858 4.456e-10
  • Residuals 42 10.276 0.245
  • Now we can say that there is significant
    differences between poisons and treatments.
  • Sometimes it is wise to use transformation to
    reduce effect of interactions. For this several
    different transformations (inverse, inverse
    square, log) could be used. For each of them
    ANOVA tables could be built. Then by inspection
    you can decide which transformation gives better
    results. Following argument could be used to
    justify transformation. If effects of two
    different categories is multiplicative then log
    of them will have additive effect. It is easier
    to interpret additive effects than others.

10
R commands for ANOVA
  • There are basically two type of commands in R.
    First is to fit general linear model and second
    is analyse results.
  • Command to fit linear model is lm and is used
  • lm(dataformula)
  • Formula defines design matrix. See help for
    formula. For example for PlantGrowth data
    (available in R) we can use
  • data(PlantGrowth) - load data into R
    from standard package
  • lmPlant lm(PlantGrowthweightPlantGrowthgroup
    )
  • Then linear model will be fitted into data and
    result will be stored in lmPlant
  • Now we can analyse them
  • anova(lmPlant) will give ANOVA table.
  • If there are more than one factor (category) then
    for two-way crossed we can use
  • lm(dataf1f2) - It will fit complete model with
    interactions
  • lm(dataf1f2) - It will fit only additive model
  • lm(dataf1f1f2) - It will fit f1 and
    interaction between f1 and f2. It is used for
    nested models.
  • Other useful commands for linear model and
    analysis are
  • summary(lmPlant) give summary after fitting
  • plot(lmPlant) - plot several useful plots
  • Please let me know if any of the results is not
    clear then we can discuss and try sort out the
    problems.

11
Exercise 3.
  • Analyse these data using ANOVA
  • http//www.ysbl.york.ac.uk/garib/mres_course/2004
    /exercise_3a.html
  • What do you think about the differences.
  • b) Analyse these data
  • http//www.ysbl.york.ac.uk/garib/mres_course/2004
    /exercise_3b.html
  • What do you think about differences?

12
References
  1. Stuart, A., Ord, KJ, Arnold, S (1999) Kendalls
    advanced theory of statistics, Volume 2A
  2. Box, GEP, Hunter, WG, Hunter, JS (1978)
    Statistics for experimenters
Write a Comment
User Comments (0)
About PowerShow.com