Experimental designs and Analysis of Variance Assumptions and power - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Experimental designs and Analysis of Variance Assumptions and power

Description:

Tests do not give information on the degree or direction of the violation ... Only 2 out of 64 experiments mentioned ... Why do psychologist neglect power? ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 32
Provided by: rug
Category:

less

Transcript and Presenter's Notes

Title: Experimental designs and Analysis of Variance Assumptions and power


1
Experimental designsandAnalysis of
VarianceAssumptionsand power
2
Overview
  • Assumptions in ANOVA
  • Independence
  • Normality
  • Homogeneity of variance
  • Statistical power
  • Determinants effect size
  • A priori and post hoc power analysis

3
Model assumptions
  • 1. Independence eij independent or observations
    are independent
  • 2. Identical distributions within and between
    groups the errors eij have the same distribution
  • ? Between equal variances
  • 3. Homogeneity of variance variance of eij is
    the same in all groups
  • 4. Normality errors are normally distributed
    with mean 0 and standard deviation s

4
Model assumptions
  • Effect on inferential procedures (tests and
    confidence intervals)
  • Type I error nominal (level of significance)
    versus actual a level (under violations)
  • Type II error (power) decrease of power caused
    by violations?
  • Is technique robust against violations of
    assumptions?
  • nominal and actual a level are equal
  • power will not be affected

5
Model assumptions
  • Test or test statistic is called liberal when the
    nominal a is smaller than the actual a
  • consequence too often falsely rejecting H0
  • Test or test statistic is called conservative
    when the nominal a is larger than the actual a
  • consequence too often falsely not rejecting H0

6
Independence assumption
  • No relationship between cases
  • Effect of violation
  • estimators of standard errors are biased
  • (estimates are generally too small)
  • actual ? level too large
  • (too often falsely rejecting H0)
  • Detection of violation
  • design experiment, sample, data collection
  • intraclass correlation

7
Independence assumption
  • Intraclass correlation correlation within
    classes (groups of cases) in the data set
  • The degree of resemblance between micro-units
    belonging to the same macro-unit (groups)
    (Snijders Bosker, 1999)

8
Independence assumption
  • Experimental design
  • Whenever the treatment is individually
    administered, observations are independent. But
    where treatments involve interaction among
    persons, such as discussion methods or group
    counseling, the observations may influence each
    other (Glass Hopkins, 1984).
  • Multilevel analysis (multiple data levels)
  • Network analysis (interactions between cases)
  • Correction of violation
  • Test at a more stringent level of significance
    smaller nominal a level (thus actual a will have
    a normal value)
  • Consequence decreased power

9
Normality assumption
  • In each group the dependent variable follows a
    normal distribution (mean µj and variance s 2)
  • Effect of violation
  • ANOVA is robust with respect to Type I error
  • power attenuated by kurtosis
  • Central Limit Theorem
  • Mean (sum) of independent observations
    approaches normal distribution as sample size n
    increases
  • Detection of violation
  • Graphs probability plot (PP, QQ), histogram,
    box plot
  • Statistics skewness, kurtosis
  • Tests

10
Testing Normality
  • Tests for normality
  • Chi-square test
  • Kolmogorov-Smirnov test
  • Shapiro-Wilk test
  • Test H0 variable follows normal distribution vs.
  • Ha variable is not normally distributed
  • the best currently available procedure for
    testing normality. (Miller, 1997)
  • Wrong? ? Not rejecting H0 does NOT equal
    accepting!

use?
11
Checking Normality
  • Tests do not give information on the degree or
    direction of the violation
  • Statistics like skewness and kurtosis do
  • Graphical inspections do
  • Use combinations
  • Stevens advises to use the Shapiro-Wilk test in
    combination with skewness en kurtosis
  • This combination has enough power to detect
    non-normality (in small samples, extreme
    non-normality)

12
Normality
  • Correction of violation
  • Transformation of the dependent variables
    (separately Stevens gives examples for
    standardized variables)
  • Consequence more difficult to interpret
  • Collect more data
  • Data trimming (remove largest variables
    outliers)
  • Check if data contains outliers (and remove...??)
  • Outliers (on dependent variable y)
  • inspect standardized or studentized residuals
  • ? between 2 and 2 or 3 and 3

13
Homogeneity of Variance
  • In each group the dependent variable follows a
    normal distribution with constant variance s 2
  • Effect of violation
  • by far most serious problem
  • inflation of the actual Type I
  • more severe when the sample sizes are unequal
  • unequal group sizes and unequal variances
  • F liberal if large variances appear in small
    groups
  • F conservative if large variances appear in large
    groups

14
Homogeneity of Variance
  • Detection of violation
  • Check the sample variances (or SDs)
  • rule of thumb largest SD should be smaller than
    2 times smallest SD (Moore McCabe, 2003)
  • rule of thumb sample sizes should be
    approximately equal (largest/smallest ratio
    smaller than 1.5 Stevens, 2002)
  • Detection with Levene test
  • robust against non-normality
  • less robust against skewness ? use medians
    instead of means
  • Is using this test correct?? H0 variances are
    equal
  • You dont want to reject, but not rejecting does
    NOT equal accepting the hypothesis!

15
Homogeneity of Variance
  • Correction of violation
  • Use more stringent significance level
  • Transformation of the individual variables, to
    stabilize the variances (EXAMINE in SPSS)
  • relationship means and SDs
  • counts ? square root
  • response times ? logarithm
  • proportions ? arcsine transformation

16
Significance testing
  • Type I error a P(reject H0 H0 is true)
  • Fixed at level a significance level
  • Type II error ß P(not reject H0 H0 is false)
  • Increases (sharply) if a decreases

17
Statistical Power
  • Not much attention is paid to Type II error
  • Power 1 ß P(reject H0 H0 is false)

18
Statistical Power
  • Power depends on
  • 1. significance level a
  • 2. sample size n
  • 3. effect size which effect (with respect to
    size) has to be found in the population
  • 4. nature of the test
  • Power analysis
  • pre testing which power should your test have?
  • ? which sample size is needed?
  • post testing was it possible to find an effect?

19
Power Sample size
n increases ? power increases
20
Power Significance level
? increases ? power increases
21
Power Effect size
effect size increases ? power increases
22
Effect size
  • Effect sizes in ANOVA (SPSS)
  • d difference between means in units SD
  • ? 2 SSA / SST correlation ratio (R2 in
    regression)
  • small (0.01), medium (0.06), large (0.15)
  • ?p2 small (0.01), medium (0.06), large (0.14)
  • ?p2 SSA / (SSA SSE)
  • ?2 better estimates of population effect, others
    overestimate effect size

23
Improving Power
  • Increase sample sizes (group sizes)
  • Use less stringent significance level a
  • Use one-sided tests instead of two-sided (if
    possible)
  • Decrease error variance (noise) by
  • making homogeneous groups
  • using factorial designs or repeated-measures
    designs
  • using covariates
  • Try to increase effect sizes by correctly using
    independent variables
  • Use a smaller number of dependent variables

24
Estimating Power
  • What is needed to calculate power?
  • significance level
  • sample size
  • the distribution under Ha ? effect sizes
    (so-called noncentral distributions)
  • Cohen The degree to which H0 is false is
    indexed by the discrepancy between H0 and Ha and
    is called ES.
  • Observed power (SPSS) use observed effect size
  • Use power tables or special programs (e.g. GPower)

25
Example
  • t test
  • effect size d
  • a 0.05
  • n1 n2 15
  • Post hoc

26
Post hoc power analysis
  • Also known as retrospective power
  • Based on observed effect size in the sample,
    which is only an estimate of true effect size
  • Useful as a priori analysis for next experiment
  • Performed after test, so the result of the test
    is already known and retrospective power does
    offers no additional information for explaining
    non-significant results
  • Better to calculate Confidence Intervals (or ESs)

27
Example
  • t test
  • effect d 1
  • a ?
  • n1 n2 ?
  • A priori

d 1, a 0.05 power 0.8 ? n 34
28
A priori power analysis
  • Controlling statistical power before a study is
    actually conducted
  • Compute sample size n as a function of the
    required power level, pre-specified a level, and
    population effect size to be detected
  • Cohen (1992) gives table with sample sizes for
    several tests with power of 0.80, for small,
    medium, and large effects.

29
Statistical Power
  • Stevens Several low-power studies that report
    non-significant results of the same character are
    evidence for an effect.
  • Research shows that small and medium effect
    sizes are very common in social science research
    and therefore there is a large probability that
    the power is small
  • Cohen found the mean power to detect medium
    effect sizes to be 0.48. Thus the chance of
    obtaining a significant result was about that of
    tossing a head with a fair coin.

30
Statistical Power
  • Sedlmeier and Gigerenzer the power of studies
    has not increased over the past 24 years.
  • Sedlemeier, P. Gigerenzer, G. (1989). Do
    studies of statistical power have an effect on
    the power studies? Psychological Bulletin, 105,
    309-316.
  • Median power they found in a meta analysis is
    0.37
  • Only 2 out of 64 experiments mentioned power, and
    it was never estimated
  • Non-significance was generally interpreted as
    confirmation of the null hypothesis, although the
    median power was as low as 0.25

31
Statistical Power
  • Why do psychologist neglect power?
  • SG The ongoing neglect of power seems to be a
    direct consequence of the attempt to fuse
    opposing theories (fighting camps!) into a single
    truth, which generated confusion and illusions
    about the meaning of the basic concepts
  • Do we need power? Depends on whether researchers
    believe that they have to make a yesno decision
    after an experiment (reject accept)
Write a Comment
User Comments (0)
About PowerShow.com